Debugging a website (Drupal or otherwise) can be challenging. In this video, I go through a recent situation I faced where a client had reported their Drupal Commerce site completely slowing down every hour or so. I'll discuss the process I followed to figure out the problem and get it fixed.
Here's a breakdown of what happened
- I first used New Relic to see where the slowdown was happening. It could be a database issue, a PHP issue, maybe an external service call, who knows? New Relic can help determine this and I was able to determine that it was a database load issue that I was facing.
- Then I checked the system logs. Every hour or so, there were a lot of database insertions happening on a number of tables. It seemed really out of place and initially I couldn't narrow down why. I checked the logs and found that system cron was running at the exactly same time as the slowdown. It was also running for a similar amount of time that the slowdown was taking place. Normally, system cron only takes 1-20 seconds, but here it was running for about 3 minutes!
- Now I can review cron's code to see what should be happening. I found that cron will generate a list of tables and flush out the expired cache. Generating the list is a very resource intensive process and on this particular site, the list being generated was very large and complicated. After the list is generated, it should get permanently cached in the database and therefor it doesn't become a resource issue later. However, for some reason it was being deleted every time. This ended up being the issue I needed to find out, what was deleting the list.
- Since I needed to determine why it was being deleted, I attached logging to the general function used for deleting cache. From here I was able to trace it back to Drush, but I still didn't understand why Drush would be deleting this list of tables. I had to dig further.
- Eventually, I discovered what was happening. It turned out that the version of Drush that was being used was doing a call to try and find the system logging. However, it couldn't find it and as a side effect it cleared the cached list that cron had generated. Cron, which ran every hour, then couldn't find the cached list and so need to build it again. It was a cycle that just kept repeating itself every hour. I now understood the problem!
- And now for the fix. I needed to know why Drush was doing this and if I could prevent it. I first looked around the Drush project issue queue on Drupal.og and talked to a Drush maintainer. I wanted to know if this was an issue others were also experiencing. It turns out that it WAS a known issue and that it was resolved in a later release! The version on the site that I was working on was a few major versions behind. I bought the site up to the latest release and the issue was fixed! Cron ran and took only about 5 seconds, the generated list of tables was being cached and staying cached, and Drush was not clearing it out.
A good debugging exercise
The bug ended up being one that was with Drush, and not the website. The result, through an odd chain of events, ended up bringing the clients site to a standstill nearly every hour. Now that it's resolved, I can look back and see that it was a good exercise in debugging. Even though I didn't need to build the fix myself, it still took a lot of time and understanding to arrive at the fix, and it was great to have it resolved. Hopefully, if you find this because of a similar issue, maybe I can save you a little bit of time.
We can help
If your experiencing issues with your Drupal Commerce website, the good news is that we can help! Contact us if you would like to discuss your options.