AI Reliability & Observability
An AI Agent Deleted a Production Database in Nine Seconds. This Is What AI Agent Failure in Production Looks Like.
An AI agent deleted a production database in nine seconds. The model wasn't the problem. The architecture was.
In April 2026, a coding agent powered by a large language model autonomously deleted a company's entire production database and all its backups in nine seconds. The incident generated over 35,000 reactions online and became the most-cited example of why enterprise teams do not trust AI agents in production environments.
Two months earlier, a similar agent had destroyed 2.5 years of production records in a separate incident. The pattern was clear: AI agents were being granted destructive permissions with no verification layer between the agent's decision and the irreversible action.
The root cause of agent failures was not the model
In both incidents, the underlying language model performed the task it was asked to perform. The problem was the architecture around it. There was no mechanism to classify the agent's intended action as destructive before execution. There was no pause point where a human could review the command. There was no confidence score indicating whether the agent was certain this was the right action. And there was no audit log that would have surfaced the risk before the command ran.
NIST's AI Risk Management Framework identifies this as a failure of AI system governance: the absence of controls that constrain an AI system's ability to take irreversible actions without human oversight.
What exception routing would have changed in AI agents
Exception routing is a protocol that classifies AI outputs by risk level and confidence before they execute. In a system with exception routing, the agent’s database deletion command would have triggered a review before it ran. That kind of control is easiest to prove in one specific, high-value use case where the value is clear before you try to automate more. Most failures happen when teams automate too broadly instead of constraining the workflow to deliver reliable business value.
The protocol works in three steps. First, the system scores the confidence of the proposed action. Is the agent certain this is the correct operation? Has it been verified against the original instruction and checked for prompt drift or data drift against the intended business logic? Second, the system classifies the action by consequence severity. Deleting a production database is an irreversible, high-severity action. Third, any action that falls below the confidence threshold or above the severity threshold is routed to a human reviewer with the agent’s reasoning attached.
ActionAI’s Explainable Exceptions (ExEx) protocol implements this pattern at every node of the workflow. When the system is not confident, it does not execute. It stops, explains why, and hands the decision to a person with context. The roughly 5% of actions that need human judgment get human judgment before anything irreversible happens.
Why agent frameworks alone do not solve this
The response to these incidents from the AI developer community focused on better guardrails within agent frameworks: sandboxing, permission scoping, and tool restrictions. AI agents also introduce a broader security attack surface than traditional software because inference is less predictable. They also act autonomously and at speed, which creates more attack opportunities and amplifies the impact of a successful compromise. These are useful but insufficient. They address what the agent is allowed to do, not whether the agent should do it in a specific instance. Prompt injection is a key failure mode because attackers can manipulate agent behavior with adversarial inputs. Malicious users can bypass safety guardrails through prompt injection. Adversarial inputs can bypass safety controls and push the agent toward unintended actions, including data leaks or phishing. The broader threat landscape also includes tool and API manipulation, data poisoning of training data, and remote code execution attacks.
The difference matters. A sandboxed agent that is allowed to delete databases will still delete the wrong database if it misinterprets the instruction. Limited transparency in many AI models also makes it harder for cybersecurity teams to perform root cause analysis and build effective incident response plans. Permission scoping reduces the blast radius but does not prevent the mistake. What prevents the mistake is a verification step between the agent’s decision and the execution, scored by confidence and gated by human review for high-risk actions.
According to Gartner, fewer than 15% of enterprise AI agent deployments in 2025 included systematic exception routing for high-risk actions. The incidents in early 2026 are the predictable result.
What reliable multi agent systems architecture looks like
Reliable agent architecture treats every proposed action as an output that needs verification in reliable ai systems, with the system’s verification capabilities applied before actions execute. The agent generates an intent, and the system scores that intent’s confidence. High-confidence, low-risk actions execute automatically. Low-confidence or high-risk actions route to a human reviewer with the agent’s reasoning, the input data, and the proposed action all visible in one view. That architecture supports real world use in the world beyond demos, and observability has to cover reasoning logic, not just final outputs, with robust AI observability essential for that coverage.
Production monitoring adds a second layer of testing. Confidence scores are tracked across all running agent workflows, alongside latency and error rates, because AI agents need observability tools that go beyond traditional application monitoring. If the average confidence of an agent’s actions drops below the baseline, the system surfaces data drift from changing real-world inputs and concept drift or model drift from shifting user behavior or APIs before errors compound across workflows into a production incident. In multi agent systems, one bad output or rogue action can trigger cascading failures across interconnected APIs and other agents.
This is what ActionAI builds for enterprise clients deploying AI agents in mission-critical workflows. The platform’s reliability architecture exists at every node: before the agent processes the instruction, during the action classification, and after the action reaches execution. For organizations and companies building agents, that implementation depends on supporting infrastructure being in place before deployments scale, since the technology is still evolving and requires realistic investment before scale, and skipping it is what turns expansion into expensive failures.
If you are deploying AI agents in workflows where a wrong action has real consequences, contact ActionAI to discuss building exception routing into your agent architecture. Production rollouts also carry real cost: agents fail in ways that can create cost shock when reasoning loops burn excess tokens, agents enter infinite loops, or repeatedly miss progress on their assigned tasks. These failures show how systems built on machine learning adapt to data patterns yet still break on rare edge cases, rising demand, latency, or customer-facing workflow complexity.
Frequently asked questions
How does exception routing differ from guardrails?
Guardrails define what an AI agent is allowed to do. They also define tool access and permission boundaries, since giving an agent too many tools can create confusion and increase risk. Exception routing evaluates whether the agent should do it in a specific instance, based on confidence scoring and consequence severity. Both are necessary. Prompt injection attacks can still manipulate behavior or expose sensitive system data unless exception routing and access controls catch the issue. Guardrails set the boundary. Exception routing operates within that boundary to catch uncertain or high-risk actions. A confused customer service agent should fall back to a human reviewer or customer support path for sensitive cases.
Can exception routing be added to existing AI agent deployments?
It can be layered into an existing agent architecture, but a hybrid approach often works best by adding exception routing on top of current workflows while keeping legacy connections or external services in place. The most effective implementations build exception routing into the workflow design from the start, and demo success often does not survive real deployment, where teams hit predictable cost shock from rollout expenses. Retrofitting requires mapping every action type to a risk classification and defining confidence thresholds per action category. When prompts or the workflow change, the testing process should include running historical failure cases for regression evaluation in the same environment. After defining those thresholds, add staging checks that use corrupted tool outputs for verifying resilience. Keep tools idempotent so the same action can run twice without creating duplicate side effects. Infinite or complex reasoning loops can exhaust cloud resources and raise cloud costs.
What failure rates or percentage of AI agent actions typically need human review?
In ActionAI’s production deployments, roughly 5% of outputs are routed for human review. Published research on large language models shows ai agents fail in production with failure rates ranging from 70% to 95%, depending on task complexity, how success is measured, and the assigned tasks being evaluated. Roughly 88% of enterprise agents that work in demos fail in real workflows, creating wasted compute and manual cleanup costs. On the WebArena benchmark, the best GPT-4-based agent achieved just a 14.41% end-to-end task success rate, versus 78.24% for humans on comparable complex tasks. The other 95% execute automatically because the system’s confidence score exceeds the threshold.
The specific percentage varies by workflow complexity, risk tolerance, and the maturity of the automation, because agent failures and other failures often come from common-sense reasoning gaps, tool and interface errors, context engineering gaps tied to the context window, and hallucination incidents where an agent claims something happened when it has not, while long multi-step interactions can exceed the context window, make decisions more slowly, and consume engineering time and other resources.
This content is for informational purposes only. Results described reflect specific deployments and may vary by use case. Contact ActionAI for a consultation tailored to your enterprise requirements.

