Every year when Black Friday comes around the corner, system engineers working for retailers start battening down the hatches to ensure website availability.
You see, the retail shopping Superbowl as we’ve come to know it doesn’t really just happen on Black Friday. Nor is it just “in stores.” The shopping frenzy spills over into Thanksgiving Day, on toward the following Cyber Monday, online and offline, in a time period becoming known as the “Turkey Five.”
And these five days of shopping bliss for e-tailers can quickly turn into nightmares if companies haven’t tested prepared their ecommerce infrastructure to protect against the chaos monkeys. In fact, Inc. Magazine has called such downtime an “ecommerce kiss of death” for online retailers if they encounter it during the holiday shopping season.
That’s because a retailer stands to lose roughly 8 percent of the day's digital sales for each hour it's down, according to to ChannelAdvisor, an ecommerce optimization firm.
As Mehdi Daoudi, CEO of Catchpoint and an alumnus of Google and DoubleClick, pointed out:
“Each year, we see at least one notable website crash or suffer an outage during the peak shopping days of Black Friday, Cyber Monday, or the like. [R]etailers have been selling online for 15 years or more. There are really no excuses for a major retailer’s site to crash due to the heavy traffic loads the holidays bring.”
In other words, your customers view your downtime as inexcusable, unacceptable, and uninviting. But according to Pingdom, even the top 50 e-commerce sites in the world experience an average of 0.97% downtime a year, which equates to 3 days, 15 hours, and 39 minutes of outage. If spread evenly over the year, that’s 14.4 minutes per day, or roughly 1.92% of your sales!
Even one minute of downtime can lead to lost consumer trust and its impact on your branding.
Before we get to talking about mitigation strategies, we need to delineate the difference between different types of downtime as your customer sees it.
Site Glitches: A Problem With User Experience
In 2016, for instance, Old Navy online shoppers reported (rather aggressively) that the items they had painstakingly added to their online carts during Black Friday were disappearing, only to “mysteriously” appear in other shoppers’ carts in return.
Reported one customer on Facebook:
You KNEW it was cyber Monday. I just lost my entire cart full of hundreds of dollars in stuff and it was replaced by someone else's stuff.. Not sitting around waiting to add it all back. I'll find another site that's working to spend my money at now.
“Soft” Downtime: When The Page Loads … Barely
Customers tend to leave a retailer’s site when they notice lag, server not responding errors, or slowly updating progress bars.
Mobile sites that took an average of 5 seconds to load during Black Friday 2016 earned nearly two times the revenue of sites that averaged 19 seconds. This is due to abandoned transactions, which account for some 46% of all mobile transactions.
Your goal: Stay at 2 seconds or less for loading times.
Even if your own servers are optimized and ready for Black Friday-level traffic, third-party components and scripts can still be troublesome in making your website appear slow. This happened last year on Black Friday for Walmart, Williams-Sonoma, and Jet.
One customer shouted from the social media rooftops:
Don't have a Cyber Monday Sale if you can't handle traffic to the site!!
“Hard” Downtime: More Than Just HTTP 5xx Errors
When shoppers receive a 503 error during peak sale hours, it’s clearly a bad thing, but having that message show up for hours is, as Forbes put it, the “kiss of death.” That’s when soft downtime bleeds into hard downtime.
When customers perceive that your site is “down,” they will take their money elsewhere, and possibly never come back. One customer wrote:
“I can't believe you guys weren't prepared for Cyber Monday traffic. Sounds like you lost a lot of money from people having things in their cart and not being able to check out. I myself haven't even been able to go to the page. Don't advertise such great deals if you can't handle it!”
Preventing Downtime From Occurring in the First Place
Our CEO, Matthew Barlocker, has said:
“Downtime is avoidable when you can anticipate what will happen. This works for obvious, grandiose issues like having enough servers; but it also works for each cog and gear that breaks in your infrastructure the day of. When you have alerts at the first sign of wear and tear, you can prevent the downtime from happening.”
Thus, the alternative is ignoring the leading indicators (because you were too busy, focused on other things, etc), which inevitably leads to your customers being the first line of defense against errors.
This one is admittedly the least technical solution, but it’s also one of the easiest and cheapest. When you ask your customers to visit your website for once-a-year deals on a single day, you’re setting your servers up for failure. Instead, try offering “pre-Black Friday” sales to spread out the influx of traffic to the site. Target, which is one of the fastest-growing online retailers, did that last year to ease up on its Cyber Monday traffic. This year, Best Buy is following suit.
Strategy 2: Reduce user behavior scripts and plugins to bare minimum during high peak times.
We’re not just talking about turning off unnecessary Google Tag Manager scripts here (thought that would inevitably help, too).
Strategy 3: Launch additional cloud servers during known peak times
This strategy can give dedicated resources to your traffic in times of need. Look at past years’ traffic during the “Turkey Five” days surrounding Thanksgiving, reserve servers for the excess hits, adjust for your expected annual growth, and add 5-10% capacity on top of that. Don’t rely on “spot instances” or the like. They may be cheap, but unexpected changes in bid can shut your servers down.
According to ChannelAdvisor, “Larger retailers [can] have 50 to 100 extra machines on hand, so that when a problem arises, they can deal with the issue firsthand.”
Strategy 4: Leverage DNS failover services if you use cloud hosting.
With these DNS services, you can dynamically reroute traffic during peak times to other servers with no additional latency, and the transitions are seamless for customers.
Strategy 5: Use a robust content delivery network — or more than one.
Why put all your proverbial eggs into one basket? “Delivery over a multiple CDN architecture is a best practice for modern Ops teams,” writes Cedexis, a web performance optimization company based in Portland. “When you’re able to load balance your application, video, and website content delivery across multiple CDNs, you can ensure your end users are shielded from service degradations and outages.”
Strategy 6: Pre-Warm Your AWS ELB.
Amazon doesn’t anticipate changes in traffic for you. You can notify them of your impending traffic spikes by either (1) setting up a load balancer (more on that directly below), or (2) using an Auto Scaling Group.
Strategy 7: Test your load balancing ahead of time.
If you have your Ops team simulate a high-traffic event (or high-traffic week) ahead of time in a safe development environment, you can find weaknesses and prevent them ahead of time. A stress test is not only a good practice in today’s DevOps environment, it’s a necessity. Centralized log management software can help you achieve this by sending you alerts — even in your testing environment — when testing the load. Then you can easily reconfigure your ELB to help you better manage the traffic in production.