Red teaming LLMs exposes a harsh truth about the AI security arms race

Relentless and persistent attacks on cutting-edge models can lead to their failure, with failure patterns differing depending on the model and developer. Red teaming reveals that it’s not the sophisticated, complex attacks that can bring a model down, but rather the attacker automating continuous, random attempts that will eventually cause the model to fail.

This harsh reality is something that AI applications and platform developers must account for when building each new release of their products. Relying on a frontier model that is prone to red team failures due to persistence alone is akin to building on unstable ground. Even with red teaming, frontier LLMs, even those with open weights, are falling behind adversarial and weaponized AI.

Table of Contents

The arms race is already underway

Cybercrime costs reached $9.5 trillion in 2024, with projections exceeding $10.5 trillion for 2025. LLM vulnerabilities are contributing to this trend. For example, a financial services firm that deployed a customer-facing LLM without adversarial testing experienced a leakage of internal FAQ content within weeks, resulting in remediation costs of $3 million and regulatory scrutiny. Similarly, an enterprise software company had its entire salary database leaked after using an LLM for financial modeling.

The UK AISI/Gray Swan challenge conducted 1.8 million attacks across 22 models, and every model was breached. None of the current frontier systems can withstand determined, well-resourced attacks.

Builders are faced with a decision: integrate security testing now or deal with breaches later. Tools such as PyRIT, DeepTeam, Garak, and OWASP frameworks are available, but what’s crucial is their implementation.

Organizations that view LLM security as merely a feature rather than a foundation will learn the difference the hard way. The arms race favors those who are proactive rather than reactive.

Red teaming highlights the immaturity of frontier models

The gap between offensive capabilities and defensive readiness has never been wider. Elia Zaitsev, CTO of CrowdStrike, emphasized the need for faster responses to attacks, as adversaries are evolving at a rapid pace.

The results of red teaming so far are paradoxical, especially for AI builders who rely on stable platforms. Red teaming demonstrates that every frontier model will fail under sustained pressure.

It’s essential for builders to review the system card of each new model release, as it reflects the red teaming, security, and reliability mindset of the provider. Understanding the differences in how companies approach red teaming is crucial to avoid wasted time and resources.

Attack surfaces are dynamic, posing challenges for red teams

Builders must recognize the fluid nature of attack surfaces that red teams need to cover, despite incomplete knowledge of the threats faced by their models.

OWASP’s 2025 Top 10 for LLM Applications serves as a cautionary guide for businesses developing AI apps and expanding existing LLMs. New vulnerability categories such as excessive agency, system prompt leakage, and misinformation highlight unique failure modes of generative AI systems.

Cybersecurity is evolving due to AI, introducing new risks that must be addressed. Adversaries are leveraging AI to accelerate attacks, necessitating a proactive approach from defenders.

Adaptive attackers pose challenges for defensive tools, as they are constantly refining their approaches to overcome defenses. Builders should not solely rely on claims from model providers and should conduct their own testing.

Steps for AI builders to take

Security must be a priority for AI builders, as the risks associated with AI are complex and ever-evolving. Guardrails must be in place outside the LLM to ensure safety and security.

Input and output validation, separating instructions from data, regular red teaming, controlling agent permissions, and scrutinizing the supply chain are essential steps that AI builders must take to enhance security.

By following these measures, AI builders can strengthen the security and resilience of their AI applications in the face of evolving threats and attacks.

The arms race is already underway

Red teaming highlights the immaturity of frontier models

Attack surfaces are dynamic, posing challenges for red teams

Steps for AI builders to take

Related Posts