Anthropic published the prompt injection failure rates that enterprise security teams have been asking every vendor for

When attempting a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, the success rate was consistently 0% across 200 attempts without the need for safeguards. However, when the same attack was transferred to a GUI-based system with extended thinking capabilities, the success rate dramatically increased. With no safeguards in place, the breach rate reached 78.6% by the 200th attempt, and with safeguards, it was still a significant 57.1%.

The release of the latest models’ 212-page system card on February 5 provided detailed insights into attack success rates based on surface, attempt count, and safeguard configuration.

Understanding Enterprise Risk Based on Surface-Level Differences

Prompt injection was a previously unquantified risk that was often overlooked by security teams and AI developers. Anthropic’s system card now allows security leaders to make informed decisions based on attack success rates across different agent surfaces.

While OpenAI’s GPT-5.2 system card includes prompt injection benchmark results, it lacks detailed information on how attack success rates vary by surface or across repeated attempts. Similarly, Google’s Gemini 3 model card emphasizes relative safety improvements without publishing absolute attack success rates by surface or persistence scaling data.

Comparison of Developer Disclosures

Disclosure Category

Anthropic (Opus 4.6)

OpenAI (GPT-5.2)

Google (Gemini 3)

Per-surface attack success rates

Published (0% to 78.6%)

Benchmark scores only

Relative improvements only

Attack persistence scaling

Published (1 to 200 attempts)

Not published

Not published

Safeguard on/off comparison

Published

Not published

Not published

Agent monitoring evasion data

Published (SHADE-Arena)

Not published

Not published

Zero-day discovery counts

500+ with projects named

Not published

Not published

Third-party red teaming

Gray Swan, UK AISI, Apollo

400+ external testers

UK AISI, Apollo, Vaultis, Dreadnode

Third-party testing has shown the importance of detailed vendor disclosures, particularly in assessing risks such as prompt injection. Promptfoo’s evaluation of GPT-5.2 revealed significant vulnerabilities, highlighting the need for comprehensive security assessments.

Implications of Agent Evasion and Vulnerability Discovery

Anthropic’s SH…