Researchers broke every AI defense they tested. Here are 7 questions to ask vendors.

Many security teams are investing in AI defenses that are ineffective. Recent research from OpenAI, Anthropic, and Google DeepMind has revealed alarming findings that should give every CISO pause. Their paper, “The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections,” tested 12 AI defenses that claimed to have near-zero attack success rates. However, the research team was able to bypass these defenses with success rates above 90%. This raises a concerning implication for enterprises: most AI security products are not effective against real attackers.

The researchers tested various types of defenses, including prompting-based, training-based, and filtering-based defenses, under adaptive attack conditions. Unfortunately, all of these defenses failed under the test conditions. Prompting defenses had attack success rates of 95% to 99%, while training-based methods had bypass rates of 96% to 100%. The rigorous methodology used by the research team included 14 authors and a $20,000 prize pool for successful attacks.

Table of Contents

Why WAFs fail at the inference layer

Web application firewalls (WAFs) are stateless, while AI attacks are not. This fundamental difference explains why traditional security controls are ineffective against modern prompt injection techniques.

The researchers tested known jailbreak techniques against these defenses, such as Crescendo and Greedy Coordinate Gradient (GCG), which exploit conversational context and automated attacks, respectively. These attacks were successful because the defenses assumed static behavior and were unable to adapt to dynamic attack strategies.

According to Carter Rees, VP of AI at Reputation, “AI attacks operate at the semantic layer, which signature-based detection cannot parse, making them as devastating as a buffer overflow in traditional software.”

Why AI deployment is outpacing security

The failure of current defenses is particularly concerning given the rapid pace of AI deployment in enterprise applications. Gartner predicts that by the end of 2026, 40% of enterprise applications will integrate AI agents, up from less than 5% in 2025. This rapid deployment of AI technologies is outpacing the development of effective security measures.

Adam Meyers, SVP of Counter Adversary Operations at CrowdStrike, highlights the increasing speed of cyber threats, with adversaries using techniques that bypass traditional endpoint defenses. The CrowdStrike 2025 Global Threat Report found that the majority of detections were malware-free, indicating a shift in attacker tactics.

In a recent incident, Anthropic disrupted the first documented AI-orchestrated cyber operation, demonstrating the speed and efficiency of AI-powered attacks. Organizations need to address these evolving threats to prevent costly data breaches.

Meyers explains, “Threat actors have found ways to avoid detection by not using malware and instead exploiting vulnerabilities in AI systems. This shift in tactics poses a significant challenge to traditional security controls.”

Jerry Geisler, EVP and CISO of Walmart, emphasizes the new security threats introduced by agentic AI, which could disrupt operations and violate regulatory mandates.

Four attacker profiles exploiting AI defense gaps

The failures in AI defenses are not theoretical but are already being exploited by different attacker profiles. These attackers are able to adapt their strategies to bypass existing security measures, making traditional defenses ineffective.

According to the research, defense mechanisms eventually become part of training data, rendering security through obscurity ineffective. The research highlights the need for more robust defense mechanisms that can adapt to evolving threats.

Anthropic and OpenAI have reported on the vulnerabilities of current defenses, showing that existing security measures are not sufficient to protect against adaptive attacks. The research identifies four categories of attackers exploiting vulnerabilities in the inference layer.

External adversaries operationalize published attack research to bypass defenses. Malicious B2B clients exploit API access, compromised API consumers leverage trusted credentials, and negligent insiders remain a common vector for attacks. These attackers are able to exploit weaknesses in AI defenses to gain unauthorized access and exfiltrate sensitive data.

Why stateless detection fails against conversational attacks

The research emphasizes the need for specific architectural requirements to address the shortcomings of current AI defenses. Normalization before semantic analysis, context tracking across turns, and bi-directional filtering are essential for detecting and preventing multi-step attacks.

Jamie Norton, CISO at the Australian Securities and Investments Commission, underscores the importance of implementing governance measures to ensure data security while allowing for innovation.

Seven questions for AI security vendors

Security leaders should ask critical questions to evaluate the effectiveness of AI security solutions before procurement. These questions address key vulnerabilities identified in the research and can help organizations make informed decisions when selecting security vendors.

What is your bypass rate against adaptive attackers?
How does your solution detect multi-turn attacks?
How do you handle encoded payloads?
Does your solution filter outputs as well as inputs?
How do you track context across conversation turns?
How do you test against attackers who understand your defense mechanism?
What is your mean time to update defenses against novel attack patterns?

The bottom line

The research from OpenAI, Anthropic, and Google DeepMind highlights the shortcomings of current AI defenses against adaptive attacks. Enterprises must reassess their security measures to address the vulnerabilities identified in the research. With the rapid deployment of AI technologies, organizations need to prioritize security measures that can effectively protect against evolving threats. The gap between AI deployment and security readiness is where breaches are likely to occur.