How Digital Ocean Cost Us Thousands of Dollars in a Single Day
- On May 10, we received an email from Digital Ocean stating that one of our servers that housed an e-commerce site had been hacked and was performing a DDOS attack on other websites. They shut down our server immediately. [Here’s a pdf of the email]
- My team and myself invested an entire day migrating data from an off-server backup to a new non-Digital Ocean server, ultimately re-launching the site the same day, and our customer’s e-commerce business resumed per normal from there.
- 8 days later, we received an email from Digital Ocean stating that there was a false-positive and that our server access would be restored: “In summary, due to a breaking change made by a third-party we use to help analyze traffic, some Droplets were mistakenly identified as participating in outgoing Denial of Service attacks and had their networking capabilities taken offline.” [Here’s a pdf of that email]
- 11 days after they disabled our server, we received an update on the ticket saying that our server access has been restored: “Networking has been restored so that you can continue with the recovery of your Droplet. Preventing abuse is a top priority here at DigitalOcean, and we thank you for your help in this ongoing battle. If we receive any new information of similar activities occurring we will forward it to you for review.”
- Their email communication was very corporate and not apologetic in any real sense. They also did not follow up on support tickets related to the outage until 11 days later! (Again, the issue was 100% Digital Ocean’s fault):
“DigitalOcean recognizes and understands how disruptive this incident was to our customers and that it led to business impact for our users. We let our users down and are working to earn customer trust back by taking steps to improve our monitoring stack as a whole, as well as putting additional safety checks in place for DigitalOcean internal services.” - After they shut down our site, not only did we lose e-commerce sales, but we invested a day+ of developer hours delivering an emergency migration to a new (non-Digital Ocean) server as well as customer communication. There was likely a reputation hit with our customer in there as well, even though we did our best and urgently restored the site to a new server with zero notice.
- The total expense to us of the “false-positive” was in the thousands of dollars (inclusive of developer costs + lost sales).
- Digital Ocean has promised to reimburse us around $100 on a $300/mo server, via their SLA policy, payable in 2 weeks from now (further proof that SLA policies in website hosting are worthless!)
- This episode has led me to lose trust in Digital Ocean as a provider.
- For now, our site lives on one of the big shared hosting providers, as we determine next steps…
Here’s the rub for me:
- Digital Ocean’s only job is to host websites and related things. They failed miserably in this core responsibility.
- It took Digital Ocean 8 days to realize there was a false-positive and communicate that fact to me. It was 3 additional days before they allowed my server to be turned back on (an 11-day outage!).
- I have no idea how many other sites were affected. Thankfully, only one of my sites was affected.
- Digital Ocean’s communication on this matter was about as bad as can be.
My belief is that bad stuff happens, and you need to own it. Digital Ocean definitely did not own it in this case. I wonder how the original founders of Digital Ocean would have reacted had they been in charge…
For example, wouldn’t it be nice if an actual human would have sent a short email that said something like, “Hi I’m Judy, the CEO of Digital Ocean, and I really screwed up. I hired a 3rd-party company to blah blah blah, and they totally screwed up blah blah blah. I’m sorry it took 11 days to turn your droplet back on – that is not acceptable. How can I rebuild your trust? -Judy” But instead, I received an email signed by “Digital Ocean” that read: “DigitalOcean is confident in our commitment to excellence and that the future measures we’ve identified will drive better and better outcomes for our users.” Ummm…