Amazon today blamed human error for the big AWS outage that took down a bunch of large internet sites for several hours on Tuesday afternoon.
In a blog post, the company said that one of its employees was debugging an issue with the billing system and accidentally took more servers offline than intended. That error started a domino effect that took down two other server subsystems and so on and so on.
“Removing a significant portion of the capacity caused each of these systems to require a full restart,” the post read. “While these subsystems were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.”