How Koodo Mobile Handled 16,000 Server Requests Per Minute on Black Friday
Black Friday is one of those shopping events that can make or break ecommerce sites. If you’re ready for it, profits for the day can be great. If you’re not, well, let’s just say the whole experience can be a pretty big frustration. This post is a mini case-study that focuses on the challenge and success of positioning Koodo Mobile as a national leader in telecommunication ecommerce during these major shopping events.
Why the change?
Over the years, Koodo Mobile has steadily gained market share in Canada. The 2017 Black Friday shopping event saw some significant website traffic increases from 2016 which resulted in one of those less-than-ideal Black Friday ecommerce experiences. Needless to say, they did not want a repeat of this in 2018.They needed to rebound in a big way, making sure everything from process to code to infrastructure was looked after for future growth.
Acro Media has a great longstanding business relationship as an external service partner to compliment their in-house development teams. We took charge in the planning phase for 2018 by bringing in senior developer leadership with a fresh set of eyes and knowledge.
We identified key improvements to be made in the following areas:
- Server architecture and capacity
- Code refactoring
- Automated tests
Along with these improvements, we wanted to make sure that the Koodo Mobile infrastructure could handle at least double the traffic seen in 2017.
Significant changes were made
Making changes in the key areas identified was no small task. In order to do things right, both teams recognized that some significant changes needed to be made and additional failsafes needed to be added. Here’s a quick breakdown.
Adding in new senior leadership enhanced the established processes by improving decision making efficiency and the teams overall ability to solve problems.
Rebuilt hosting stack from the ground up
TELUS, another client of Acro Media that we lovingly refer to as the larger, older brother to Koodo Mobile, was where we drew some inspiration from for the revised hosting stack. We used it as a model to begin with while adding various improvements including:
- CDN for static and cached files
Adding a CDN (content delivery network) into the infrastructure is a key part at decreasing latency delays and reducing server hits.
- Cluster cache server using Amazon ElastiCache
Amazon ElastiCache provides redundancy with caching to eliminate potential cache server crashing.
- Custom 500 error handling
Perhaps one of the more unique enhancements that was made to Koodo Mobile was this custom 500 error handling. A 500 error happens when something has gone wrong at the server level and typically results in a white screen in the browser with a general “This page isn’t working” message. The cause can be many things, but the main reason we were concerned about were timeouts due to too many connections hitting the server. As a way around this, we built a custom cron job that was building a cached version of the current homepage in the background on a timed interval. If and when a 500 error happens, instead of showing the error, the website visitor would be shown the cached version of the homepage. The visitor might wonder why they suddenly got kicked to the homepage but could at least still continue to use the site without necessarily knowing an error had happened. Cool!
Significant code refactoring
In order to make the site run as efficient as possible, a complete code audit and refactor needed to happen. Any long-term development project is going to suffer from code-bloat over time. It’s inevitable as team members, software and processes change. A whole bunch of code changes were made which increased the efficiency of the codebase by about 33%!
Testing practices updated
Testing is a large part of any successful development project and some good practices were already in place. Acro Media already has a process for internal peer review in place. Koodo Mobile also has a solid QA team for workflow and manual regression testing.
We wanted to enhance the testing even further. We introduced Apache JMeter for load testing to analyze and measure server load testing on the new setup, Nightwatch.js was utilized for functional browser based UI testing for the main user workflows, and a variety of profiling tools were used to optimize the performance of underlying site functions.
A resounding success!
Black Friday 2018 has come and gone and the results of this hard work were resounding.
First of all, the site stayed up during the entire shopping event which was the main goal. On top of that, the site sustained about 16,000 server requests per minute throughout the entire day. In comparison, a standard day typically peaks at about 4,000-6,000 requests per minute.
As for orders, I don’t have specific numbers but Koodo Mobile did at least double the orders that they were expecting. In fact, the site was performing so well that the payment gateway was actually overwhelmed and occasionally denied people from placing orders! That was, unfortunately, entirely out of our control.
In closing, the Koodo Mobile and Acro Media teams did a great job in planning and executing their Black Friday strategy. The work put in not only resulted in great sales numbers, but also strengthens the business relationship and overall quality of the product.