Anthropic and OpenAI just exposed SAST's structural blind spot with free tools

OpenAI introduced Codex Security on March 6, stepping into the application security market that Anthropic had disrupted just 14 days earlier with Claude Code Security. Both scanners utilize LLM reasoning instead of traditional pattern matching, revealing the limitations of conventional static application security testing (SAST) tools in identifying certain vulnerability classes, putting the enterprise security stack in a challenging position.

Anthropic and OpenAI launched reasoning-based vulnerability scanners independently, uncovering bug classes that pattern-matching SAST tools were incapable of detecting. With a combined private-market valuation exceeding $1.1 trillion, the competitive landscape between the two labs ensures that the quality of detection will advance at a rapid pace compared to any single vendor’s capabilities.

Neither Claude Code Security nor Codex Security aims to replace existing security stacks; instead, they permanently alter the procurement landscape. Both tools are currently available for free to enterprise customers. Before your board of directors inquires about the scanner you are piloting and the reasons behind it, consider the head-to-head comparison and the seven actions outlined below.

Table of Contents

How Anthropic and OpenAI Converged on the Same Conclusion through Different Approaches

Anthropic released its zero-day research on February 5 alongside the launch of Claude Opus 4.6. Claude Opus 4.6 unveiled over 500 previously unidentified high-severity vulnerabilities in production open-source codebases, which had evaded detection despite extensive expert reviews and fuzzing. For instance, within the CGIF library, Claude detected a heap buffer overflow by analyzing the LZW compression algorithm, a flaw that even 100% code coverage with coverage-guided fuzzing could not uncover. Anthropic introduced Claude Code Security as a limited research preview on February 20, accessible to Enterprise and Team customers, with open-source maintainers granted expedited access. In an exclusive interview with VentureBeat, Gabby Curtis, Anthropic’s communications lead, emphasized the goal of making defensive capabilities more widely accessible through Claude Code Security.

On the other hand, OpenAI’s Codex Security stemmed from Aardvark, an internal tool powered by GPT-5 that entered private beta in 2025. During the beta phase, Codex Security scanned over 1.2 million commits across external repositories, revealing 792 critical findings and 10,561 high-severity findings in projects like OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium. Notably, Codex Security reported 14 assigned CVEs and significantly reduced false positive rates and over-reported severity during the beta period. The competitive dynamics between Anthropic and OpenAI, despite their distinct architectures and scanning scopes, has raised the bar for detection capabilities in the security domain.

Checkmarx Zero researchers uncovered instances where moderately complex vulnerabilities eluded Claude Code Security’s detection, emphasizing the need for continuous improvement. In a comprehensive scan of a production-grade codebase, Checkmarx Zero found that Claude identified eight vulnerabilities, but only two were confirmed as true positives. The detection ceiling of the scanner could be lower than expected if developers manage to obfuscate vulnerable code effectively. Although neither Anthropic nor OpenAI has subjected their detection claims to independent third-party audits, security leaders should interpret the reported numbers as indicative rather than definitive.

Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, highlighted how the competitive scanner race is accelerating innovation across the industry. Baer advised security teams to prioritize patches based on exploitability in their operational context rather than solely relying on CVSS scores, expedite the discovery-triage-patch cycle, and maintain visibility into software bill of materials to promptly identify vulnerable components.

Despite employing different methodologies and scanning different codebases, both Anthropic and OpenAI arrived at a common conclusion: the limitations of pattern-matching SAST tools and the potential of LLM reasoning to enhance detection capabilities. As these two competing labs introduce this advanced capability simultaneously, organizations should be prepared for heightened security risks and the need to proactively address vulnerabilities.

Vendor Responses: Insights into the Future of Application Security

Snyk, a developer security platform renowned for identifying and resolving vulnerabilities in code and open-source dependencies, acknowledged the technical advancements introduced by Anthropic and OpenAI. However, Snyk emphasized that the real challenge lies in fixing vulnerabilities at scale across numerous repositories without disrupting operations. Highlighting research indicating that AI-generated code is 2.74 times more likely to introduce security vulnerabilities compared to human-written code, as reported in Veracode’s 2025 GenAI Code Security Report, Snyk underscored the need for a balanced approach to leveraging AI in security practices.

Ronen Slavin, CTO of Cycode, commended the technical advancements of Claude Code Security in static analysis but cautioned about the probabilistic nature of AI models. Slavin emphasized the importance of consistent, reproducible, audit-grade results in security scanning and highlighted the broader scope of security infrastructure beyond static analysis tools. According to Slavin, SAST is just one component of a comprehensive security strategy that encompasses governance, pipeline integrity, and runtime behavior at an enterprise scale.

Baer pointed out that the introduction of reasoning-based scanners from major AI labs to enterprise customers at no cost could lead to the commoditization of static code scanning. As organizations navigate this shifting landscape, Baer anticipated a budget reallocation towards runtime protection, AI governance, and remediation automation over the next year. The focus would shift from traditional SAST licenses towards tools that facilitate quicker remediation cycles and enhance overall security posture.

Seven Key Actions to Take Before Your Next Board Meeting

Conduct comparative scans with both tools: Evaluate the findings of Claude Code Security and Codex Security against your existing SAST outputs using a representative codebase subset. Understanding the discrepancies between the two tools will help identify blind spots in your security posture.

Establish governance structures: Develop a robust governance framework before initiating a pilot with either tool. Address data privacy concerns, data processing agreements, and data classification policies to ensure compliance and data protection.

Assess coverage gaps: Map out areas not covered by Claude Code Security and Codex Security, such as software composition analysis, container scanning, DAST, and runtime detection and response. Understand the limitations of each tool within the broader security stack.

Quantify dual-use exposure: Recognize the potential risks posed by zero-day vulnerabilities discovered by Anthropic and OpenAI, as adversaries may exploit these vulnerabilities before patches are implemented. Stay vigilant and proactive in addressing emerging threats.

Prepare for board discussions: Anticipate questions from the board regarding the choice of security scanners and the rationale behind it. Provide a comprehensive comparison of Claude Code Security and Codex Security, emphasizing their unique capabilities and limitations.

Monitor the competitive landscape: Stay abreast of developments from Anthropic and OpenAI, as well as other emerging players in the security space. Understand the evolving threat landscape and the need for diverse security solutions to mitigate risks effectively.

Conduct a 30-day pilot: Implement a pilot program with both Claude Code Security and Codex Security to assess their performance in a real-world environment. Use empirical data from the pilot to inform your procurement decisions.

With only a two-week gap between the releases of Anthropic and OpenAI’s security scanners, organizations must be proactive in addressing emerging security challenges. Stay ahead of potential threats and leverage advanced security tools to safeguard your digital assets effectively.

How Anthropic and OpenAI Converged on the Same Conclusion through Different Approaches

Vendor Responses: Insights into the Future of Application Security

Seven Key Actions to Take Before Your Next Board Meeting

Related Posts