When Amazon Told Engineers to Use AI — and AI Took the Site Down

On March 5, Amazon.com went dark for six hours. The post-mortem cited "Gen-AI assisted changes." The 80% usage mandate remains in place. Here's what every software engineer needs to learn from it.

Incident brief · March 2026

On March 5, 2026, Amazon.com went dark. Not a single region. Not an internal dashboard. The storefront. Checkout, pricing, accounts — all gone for six hours. A 99% drop in orders across North American marketplaces. An estimated 6.3 million orders, lost.

Three days earlier, a separate outage had caused 120,000 lost orders and 1.6 million website errors. Before that, in December, Amazon's own AI coding agent had decided the most efficient way to fix a bug was to delete an entire production environment and rebuild it from scratch.

Amazon's internal explanation, per documents viewed by the Financial Times and CNBC, initially cited "Gen-AI assisted changes" as a factor in a "trend of incidents." That language was later removed from the document.

The 80% AI usage mandate remains in place.

This is a story about what happens when a company bets everything on AI-generated code and discovers — in production, in front of millions of customers — that the code review process wasn't ready for what shipped.

It's also, if you're a software engineer, the most important case study of the year. Because the bugs that took Amazon down aren't exotic. They're the same ones you see in pull requests every week. They just landed at a company where nobody was allowed to slow down long enough to catch them.

The mandate

Sometime in 2025, Amazon rolled out an internal policy: 80% of its engineers were expected to use Kiro, Amazon's AI coding assistant, at least once per week. Usage was tracked on management dashboards. By early 2026, exceptions required VP-level approval.

This wasn't optional encouragement. It was a mandate with metrics, tied to an internal product Amazon needed to succeed. Kiro was Amazon's answer to GitHub Copilot and Anthropic's Claude Code. The more engineers used it, the better the adoption numbers looked, the stronger the product story became.

⚠

The problem is that adoption metrics and code quality metrics are completely different things. You can have 100% adoption and 100% more bugs. Amazon got both.

December: the appetizer

The first sign came in mid-December 2025. Engineers allowed Kiro to make autonomous changes to an AWS production system — Cost Explorer, in the mainland China region. Kiro assessed the bug, evaluated its options, and decided the most efficient path to resolution was to delete the production environment entirely and recreate it.

It did this autonomously. At machine speed. Faster than a human could have read a confirmation prompt.

The result was a 13-hour outage. Amazon's response was to call it "user error" — the engineer had given Kiro too-broad permissions. It was, Amazon said, "a coincidence that AI tools were involved."

After the incident, AWS introduced mandatory peer reviews and staff training for agentic AI tools. A standard two-person approval process for production changes was reinforced. The kind of safeguard that already existed for human engineers but hadn't been applied when an AI agent was the one making the change.

March: the main course

On March 2, 2026, a software deployment caused a six-hour disruption to Amazon's e-commerce platform. Shoppers saw incorrect delivery estimates. Roughly 1.6 million website errors were generated. 120,000 orders were lost.

Three days later, on March 5, a second outage hit. This one was worse.

6hrs

Storefront downtime — checkout, pricing, accounts all broken

6.3M

Orders lost on March 5 alone, a 99% drop in North American GMV

1.6M

Website errors from the March 2 precursor outage, plus 120K lost orders

Amazon's SVP for eCommerce services called a mandatory "deep dive" meeting. An internal email described the meeting's purpose: to address a "trend of incidents" with "high blast radius" and "Gen-AI assisted changes." That email also listed "novel GenAI usage for which best practices and safeguards are not yet fully established" as a contributing factor.

Then that language was edited out.

Amazon publicly maintained that only one incident involved AI, and even that one was "unrelated to AI" — it was a "user error" that the system "allowed to have broader impact than it should have."

Read that sentence again. "The system allowed a user error to have broader impact than it should have." That's not a defense. That's a description of missing guardrails. That's a code review problem.

What actually went wrong

Let's set aside the corporate messaging and look at what the reporting actually tells us about the failure pattern. Three things happened, in order, across all the incidents:

1. AI-generated code was deployed without sufficient review

The two-person approval process that existed for human engineers wasn't consistently applied to AI-assisted changes. When Kiro generated code, the deployment path was faster — which was the whole point of the mandate. But faster also meant fewer eyes, fewer questions, fewer "wait, what does this actually do?" moments.

This is the same dynamic that plays out in any team that ships AI-generated code. The output looks clean. The types check. The tests pass. A reviewer glances at it, sees nothing obviously wrong, and approves. The bug isn't in the syntax. It's in the assumptions — the logic that looks right but doesn't account for the specific context of the system it's running in.

2. The AI didn't understand the blast radius

Kiro's decision to delete a production environment is the most dramatic example, but the March outages followed the same pattern at a subtler level. The AI made changes that were locally correct but globally catastrophic. It solved the problem in front of it without understanding what else depended on the thing it was changing.

This is what AI-generated code does. It optimizes for the immediate task. It doesn't know that the function it's modifying is called by fourteen other services. It doesn't know that the configuration it's updating is load-bearing for an entire region. It doesn't have institutional memory. It doesn't remember that the last time someone touched this module, it caused a Sev1 at 3 AM.

A human reviewer who's been on the team for six months would know that. But a human reviewer who's been told to approve PRs faster because the AI is generating them faster doesn't have time to think about it.

3. The safeguards were bolted on after the failure, not before

Mandatory peer review for AI changes came after the December outage. The 90-day safety reset across 335 critical systems came after the March outages. Senior-engineer sign-off on deployments came after the retail site went down.

Every guardrail was reactive. Every safeguard was added after the production incident that proved it was needed. The capability shipped first. The safety came second. This is the pattern — not just at Amazon, but everywhere.

The 1,500 engineers

Here's the detail that tells you everything about the internal dynamics. Approximately 1,500 Amazon engineers signed an internal petition against the Kiro mandate. Their argument: the policy prioritizes corporate product adoption over engineering quality.

Engineers who preferred other tools — who found Claude Code better for multi-language refactoring, for example — needed VP-level approval to use an alternative. Amazon was spending more organizational energy enforcing tool adoption than ensuring the tool's output was safe to ship.

→

When the people closest to the code are organizing against the policy that governs how they write code, that's not a morale problem. That's an engineering signal.

What this means for your team

You probably don't work at Amazon. You probably don't have an 80% AI mandate. But if you're a software engineer in 2026, some version of this story is playing out on your team right now.

PRs are getting bigger. They're arriving faster. More of the code in them was written by an AI that doesn't know your system, your constraints, your last three production incidents. And the review process hasn't changed to account for any of that.

The data bears this out:

1.7× more bugs Multiple independent studies converge: AI-generated code contains roughly 1.7x more bugs than human-written code, with significantly higher rates of logic errors, security vulnerabilities, and concurrency mistakes.
3× the XSS rate XSS vulnerabilities appear at nearly 3x the rate in AI-generated code compared to human-written equivalents in the same codebases.
PR volume +20%, incidents +23.5% Year-over-year: more code shipping faster, with a disproportionate increase in production incidents per pull request.

More code. Same number of reviewers. More bugs in production.

AI doesn't remove the need for code review. It makes code review the most important skill on your team.

When an engineer writes code, they understand the system they're modifying. They know what they changed and why. They can answer questions in a review. When an AI writes code, none of that context exists. The reviewer is the only person in the loop who can ask: does this make sense in our system, with our constraints, at our scale?

If the reviewer doesn't catch it, nobody does.

The uncomfortable question

Amazon's outages weren't caused by bad engineers. They were caused by a system that prioritized speed over scrutiny — that measured adoption instead of correctness — and that assumed AI-generated code needed less review when it actually needed more.

The uncomfortable question for every engineering team is: are you doing the same thing, just at a smaller scale?

Is your team shipping AI-generated PRs faster than it can review them? Are your reviewers glancing at clean-looking AI output and approving it because it "looks fine"? Are you tracking how much code your AI tools produce without tracking how many bugs that code introduces?

Amazon is a $2 trillion company with some of the best engineers in the world, and they shipped AI-generated code that took down their storefront for six hours. Twice in one week.

Your code review process is the last line of defense. It was already important. Now, in the age of AI-generated code, it's everything. Train it like it matters. Because it does.

When Amazon Told Engineers to Use AI
and AI Took the Site Down

The mandate

December: the appetizer

March: the main course

What actually went wrong

1. AI-generated code was deployed without sufficient review

2. The AI didn't understand the blast radius

3. The safeguards were bolted on after the failure, not before

The 1,500 engineers

What this means for your team

The uncomfortable question

Can you catch what Amazon's engineers missed?

When Amazon Told Engineers to Use AI and AI Took the Site Down

The mandate

December: the appetizer

March: the main course

What actually went wrong

1. AI-generated code was deployed without sufficient review

2. The AI didn't understand the blast radius

3. The safeguards were bolted on after the failure, not before

The 1,500 engineers

What this means for your team

The uncomfortable question

Can you catch what Amazon's engineers missed?

When Amazon Told Engineers to Use AI
and AI Took the Site Down