When it comes to ecommerce, a fast site can make a big difference in overall sales. I recently went through an exercise to tune a Drupal 7 Commerce site for high traffic on a Black Friday sales promotion. In previous years, the site would die in the beginning of the promotion, which really put a damper on the sale! I really enjoyed this exercise, finding all the issues in Commerce and Drupal that caused the site to perform sub-optimally.
FYI, We also have a Drupal 8 Commerce Performance Tuning guide here.
In our baseline, for this specific site the response time was 25 seconds and we were able to handle only about 1000 orders an hour. With a very heavy percentage of 500s, timeouts and general unresponsiveness. CPU and memory utilization on web and database servers was very high.
Fast-forward to the end of all the tuning and we were able to handle 12K-15K orders an hour! The load generator couldn’t generate any more load, or the internet bandwidth on the load generators would get saturated, or something external to the Drupal environment became the limiter. At this point, we stopped trying to tune things. Horizontal capacity by adding additional webheads was linear. If we had added more webheads, they could handle the traffic. The database server wasn’t deadlocking. Its CPU and memory was very stable. CPU on the web servers would peak out at ~80% utilization, then more capacity would get added by spinning up a new server. The entire time, response time hovered around 500-600ms.
Enough about the scenario. Let’s dive into things.
The first step in tuning a site for a high volume of users and orders is to build a script that will create synthetic users and populate and submit the form(s) to add item(s) to the cart, register new users, input the shipping address and any other payment details. There’s a couple options to do this. JMeter is very popular. I’ve used it in the past with pretty decent success. In my most recent scenario, I used locust.io because it was recommended as a good tool. I hadn’t used it before and gave it a try. It worked well. And there are other load testing tools available too.
OK, now you are generating load on the site. Now start tuning the site. I used New Relic's APM monitoring to flag transactions and PHP methods that were red flags. Transactions that take a long time or happen with great frequency are all good candidates for red flags. If you don’t have access to New Relic, another option is Blackfire. Regardless what you use for identifying slow transactions, use something.
Make sure that there’s nothing crazy going on. In my case, there was a really bad performing query that was called in the theme’s template.php and it was getting loaded on every single page call. Even when it wasn’t needed. Tuning that query gave use an instant speed-up in performance.
After that, we starting digging into things. There are several core and contrib patches I’ll mention and explain why and when you should consider applying them on your site.
In your specific commerce site, things might be different. You might have different payment gateways or external integration points. But the process of identifying pain points it the same. Run a 30-60 minute load test and find long running PHP functions. Then fix them so it doesn’t take as long.
As a first step, install the Memcache (or Redis) module and set it up for locking. Without that one step, you’ll almost immediately run into deadlocks on the DB for the semaphore table. This is a critical first step. From my experience, deadlocks are the number one issue when running a site under load. And deadlocks on the semaphore table is probably the most common scenario. Do yourself a favor. Install Memcache and avoid the problem entirely.
Then see if you can disable form caching on checkout and user registration. This helped save a TON of traffic against the database for forms that really don’t need to be cached. More about that later in specific findings.
One last thing before diving into some findings...
SHOW ENGINE INNODB STATUS
...will become your favorite friend. Use it to find deadlocks on your MySQL server.
Specific FindingsThe following section describes specific problems and links to issues and patches related to the problems.
- Do not attempt field storage write when field content did not change
Commerce and Rules run and reprocess an order a lot. And then blindly save the results. If nothing has changed, why re-save everything again? So don’t. Apply this patch and see fewer deadlocks on order saves.
- field_sql_storage_field_storage_load does use an unnecessary sort in the DB leading to a filesort
Many times it makes sense to use your database to process the query. Until it doesn’t make sense. This is a case it leads to a filesort in MySQL (which you can discover using EXPLAIN in MySQL) and locking of tables and deadlocks. It is not that hard to do the sort in PHP. So do it.
- Do not make entries in "cache_form" when viewing forms that use #ajax['callback'] (Drupal 7 port)
This is a huge win, if you can pull it off. For transient form processing like login and checkout, disabling form cache is a huge relief to the DB. You might need to put the entire cart checkout onto a single page. No cart wizard. But the gains are pretty amazing.
- If you are using captcha or anything with ajax on it on the login page, then you’ll need to make sure you are running the latest versions of Captcha and Recaptcha. See issues #2449209 and #2219993. Also, side note: if using the timing feature of recaptcha, the page this form falls on will not be cacheable and tends to bust page cache for important pages (like homepages that have a newsletter sign up form).
- form_get_cache called when no_cache enabled
You’ve done all that work to cut down on what is stored in cache. Great. But Drupal still wants to retrieve from cache. Let’s fix that. Cut down more DB calls.
- commerce_payment_pane_checkout_form uses form_state values instead of input
If your webshop is like most webshops, it is there to generate revenue. If you disable form caching on checkout, without this patch the values in your payment (including the ones for receiving payment) aren’t captured. Oops. Let’s fix that too.
- Variable set stampede for js and css during asset building
If you are using any auto scaling system and building out new servers when the site is under heavy load, you might already be using Advagg. But if you aren’t and are still using Drupal core’s asset system, spinning up a new system or two will cause some issues. Deadlocks galore when generating the CSS and JS aggregates. So either install Advagg or this patch.
- Reduce database load by adding order_number during load
Commerce and Rules really like to reprocess orders. An easy win is to reduce the number of one-off resaves and assign the order number after the first load.
- Never use aggregation in maintenance mode
While the site is under heavy load, the database sometimes becomes unreachable. Drupal treats this as maintenance mode. And tries to aggregate the JS/CSS and talk to the database. But the database isn’t reachable. It is a little ridiculous to aggregate JS/CSS on the maintenance page. And even more to try to talk to the database. So cut out that nonsense.
- drupal_goto short circuits and doesn't set things to cache
If you have any PHP classes you are using during the checkout, Drupal’s classloader auto loads them into memory. It then keeps track of where the files exist on the disk and this makes the next load of those classes just that much faster. Well, drupal_goto kills all this caching. And drupal_goto gets called when navigating through checkout.
Wow! That was a long list of performance enhancements. Here’s a quick recap though. Identify critical flow of your application. Generate load on that flow. Use a profiler to find pain points in that process. Then start picking things off, looking on drupal.org for existing issues, filing bugs, applying patches. Many of the identified issues discussed here will apply to your site. Others won’t apply and you’ll have different issues.
Surprisingly, or maybe not surprisingly, the biggest wins in our discovery process were the low hanging fruit, not the complex changes. That query in the template.php was killing the site. After that, switching to use Memcache for the semaphore table and eliminating form cache for orders also cut down on a lot of chatter with the database.
I hope you too can tune that Drupal 7 Commerce site to be able to handle thousands of orders an hour. The potential exists in the platform, it is just a matter of giving performance bottlenecks a little attention and fine tuning for your particular use case. Of course, if you need a little help we'd be happy to assist. A little bit of time spent can have you reaping the rewards from then on.