Anthropic published the prompt injection failure rates that enterprise security teams have been asking every vendor for

When attempting a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, the success rate was consistently 0% across 200 attempts without the need for safeguards. However, when the same attack was transferred to a GUI-based system with extended thinking capabilities, the success rate dramatically increased. With no safeguards in place, the breach rate reached 78.6% by the 200th attempt, and with safeguards, it was still a significant 57.1%.

The release of the latest models’ 212-page system card on February 5 provided detailed insights into attack success rates based on surface, attempt count, and safeguard configuration.

Table of Contents

Understanding Enterprise Risk Based on Surface-Level Differences

Prompt injection was a previously unquantified risk that was often overlooked by security teams and AI developers. Anthropic’s system card now allows security leaders to make informed decisions based on attack success rates across different agent surfaces.

While OpenAI’s GPT-5.2 system card includes prompt injection benchmark results, it lacks detailed information on how attack success rates vary by surface or across repeated attempts. Similarly, Google’s Gemini 3 model card emphasizes relative safety improvements without publishing absolute attack success rates by surface or persistence scaling data.

Comparison of Developer Disclosures

Disclosure Category	Anthropic (Opus 4.6)	OpenAI (GPT-5.2)	Google (Gemini 3)
Per-surface attack success rates	Published (0% to 78.6%)	Benchmark scores only	Relative improvements only
Attack persistence scaling	Published (1 to 200 attempts)	Not published	Not published
Safeguard on/off comparison	Published	Not published	Not published
Agent monitoring evasion data	Published (SHADE-Arena)	Not published	Not published
Zero-day discovery counts	500+ with projects named	Not published	Not published
Third-party red teaming	Gray Swan, UK AISI, Apollo	400+ external testers	UK AISI, Apollo, Vaultis, Dreadnode

Third-party testing has shown the importance of detailed vendor disclosures, particularly in assessing risks such as prompt injection. Promptfoo’s evaluation of GPT-5.2 revealed significant vulnerabilities, highlighting the need for comprehensive security assessments.

Implications of Agent Evasion and Vulnerability Discovery

Anthropic’s SH…

…

Understanding Enterprise Risk Based on Surface-Level Differences

Comparison of Developer Disclosures

Implications of Agent Evasion and Vulnerability Discovery

Related Posts