Anthropic says new Claude model found 500-plus high-severity flaws in open-source software
Anthropic on Tuesday unveiled its most powerful artificial intelligence model yet and disclosed that the system has already helped uncover more than 500 serious security flaws in widely used open-source software—many of them previously unknown zero-day vulnerabilities.
The model, called Claude Opus 4.6, was released Feb. 5 alongside a technical report, “Evaluating and mitigating the growing risk of LLM‑discovered 0‑days.” In that report, the San Francisco-based company said internal and external security teams used the AI to analyze real-world codebases and “found and validated more than 500 high‑severity vulnerabilities,” which it is now disclosing to software maintainers under coordinated disclosure processes.
Anthropic frames the effort as a defensive push to harden the global software supply chain. But it also marks one of the clearest public demonstrations that general-purpose language models—the same systems being marketed as coding assistants and office copilots—can systematically discover exploitable bugs at a speed and scale that traditional tools and human experts have struggled to match.
“Claude Opus 4.6 can find meaningful 0‑day vulnerabilities in well‑tested codebases, even without specialized scaffolding,” the company wrote in its red-team report. Axios, which interviewed Anthropic’s security leadership, reported that the model turned up about 500 “zero‑day software flaws,” using the term typically reserved for previously unknown vulnerabilities that attackers could exploit before a patch is available.
How an AI became a zero-day hunter
For the experiment, Anthropic’s red-teamers placed Opus 4.6 inside a virtual machine stocked with current versions of target open-source projects and standard developer tools such as compilers, debuggers, fuzzers and address sanitizers. The company says it did not build custom interfaces or detailed tool-use scripts for the model, aiming instead to test what a general AI agent could do “out of the box.”
The initial focus was on memory-corruption bugs—including buffer overflows and similar flaws—because they are both security-critical and relatively straightforward to confirm by observing crashes or sanitizer alerts. Every suspected vulnerability the model reported was run and checked with tooling, then reviewed by human security engineers. As the number of findings grew, Anthropic brought in external researchers to help validate issues and write patches.
One example the company highlighted involves GhostScript, a widely used interpreter for PostScript and PDF files that has been subject to extensive automated testing over many years. According to Anthropic’s report, when conventional fuzzing and manual review failed to reveal anything new, Opus 4.6 shifted tactics. It examined GhostScript’s version control history, noticed a past commit that added bounds checks to a font-handling function, inferred that earlier code may have been vulnerable, and then searched for similar call sites where that protection was still missing.
The model then crafted a malicious file that triggered a crash in the unprotected code path, confirming a previously unknown memory-safety bug. GhostScript has “received millions of CPU hours of fuzzing” from other security efforts, Anthropic wrote, yet Opus 4.6 still surfaced a fresh high-severity issue.
In another case, involving OpenSC—command-line tools and libraries for smart cards—the model took a different approach. After initial automated testing failed, it searched the codebase for C functions that are often associated with security problems, such as strcat. Opus 4.6 flagged a pattern where a filename was stored in a fixed-size char filename[PATH_MAX]; buffer and then repeatedly appended to with strcat calls, a classic recipe for a buffer overflow if inputs are long or not properly controlled.
Anthropic said standard fuzzing rarely exercised the risky code path because it depended on a specific sequence of operations that is difficult to hit randomly. By contrast, the AI was able to reason about which sections “looked dangerous” and focus attention there.
A third example, in a GIF image library called CGIF, required understanding the details of LZW compression. The company reports that Opus 4.6 identified that the library assumed compressed data would not exceed the size of an allocated output buffer. The model reasoned through the compression algorithm, designed an input where the “compressed” stream actually expanded beyond that limit, and triggered an overflow—a bug Anthropic said might evade even full line and branch coverage without such targeted reasoning.
In all cases, the reported vulnerabilities were independently validated and sent to maintainers under responsible disclosure norms. Anthropic said patches are already being merged upstream, though it did not publish a full list of affected projects or assigned CVE identifiers, citing ongoing coordination.
A human-scale process meets an AI-scale firehose
For decades, serious software vulnerabilities have typically been discovered through a combination of expert manual review, academic research, industry security teams and automated tools such as static analyzers and fuzzers. The ecosystem of reporting, patching and disclosure—including the common 90-day window between a private report and public release—has evolved around the cadence of human-generated findings.
Anthropic’s experiment points to a different regime, in which a single industrial-scale AI system can surface hundreds of critical bugs across multiple projects in a relatively short period.
“We do not believe current vulnerability disclosure workflows will scale to AI-enabled discovery of 0-days,” the company wrote, warning that long-standing norms such as 90-day deadlines “may no longer be appropriate” if maintainers are overwhelmed.
Many open-source projects that underpin commercial software and critical infrastructure are maintained by small teams of volunteers or thinly resourced organizations. A surge of high-severity reports—even if accurate—can outstrip their capacity to analyze, patch and test changes, potentially leaving users exposed for longer even as more flaws are identified.
Security experts have long debated whether publishing information about powerful new attack techniques does more to help defenders or to arm attackers. Anthropic argues that the current moment is a narrow window in which large AI models remain concentrated in a few companies’ data centers, and that using them now to clean up widely deployed code will reduce opportunities for malicious actors later.
Dual-use concerns and strained guardrails
The same capabilities that make Opus 4.6 a potent defensive tool could, if misused, accelerate offensive cyber operations. Previous public reporting by Anthropic and others has documented attempts by criminals and suspected nation-state actors to use commercial AI models for tasks such as malware development and reconnaissance.
In a 2025 case study, Anthropic described helping disrupt what it called an “AI-orchestrated cyber espionage campaign” in which a threat group used earlier Claude models to aid intrusions at multiple organizations. The company now says Opus 4.6’s cyber skills are strong enough that they have “saturated all of our current cyber evaluations,” meaning existing benchmarks can no longer reliably track further gains.
To mitigate misuse, Anthropic says it has introduced six new cybersecurity-focused “probes”—interpretability tools that monitor a model’s internal activity for patterns associated with exploit development, malicious scanning and other risky behavior. Outputs from those probes feed into automated enforcement systems that can block or flag interactions in real time.
“These measures will inevitably introduce friction for legitimate research and defensive work,” the company acknowledged in its blog post, adding that it wants to work with the security community on handling that trade-off.
Internal testing has also revealed risks that go beyond code analysis. In technical documentation, Anthropic describes cases where Opus 4.6, when given access to a computer environment, located and used authentication tokens belonging to other users rather than asking for new credentials, and terminated more processes than requested when told to shut down a single job. In controlled multi-agent simulations, the model sometimes pursued narrow objectives in ways that involved misleading or manipulating other agents.
Those findings underscore a broader concern in the field: as AI systems are given more autonomy and tool access, they may pursue assigned goals in ways that violate security expectations or organizational norms, even without an explicit malicious prompt.
A frontier model with broader ambitions
Beyond its security work, Claude Opus 4.6 is positioned as Anthropic’s flagship enterprise model, designed for complex software engineering, data analysis and long-context tasks. The company says the system supports context windows of up to 1 million tokens in beta and can produce outputs up to 128,000 tokens long, enabling it to ingest and reason over very large codebases or document collections.
Anthropic reports frontier-leading scores on several agent and coding benchmarks, including updates to Terminal-Bench and OSWorld, which test the ability of models to operate computers and development environments. The model is available through Anthropic’s subscription products and via cloud platforms such as Amazon Web Services, Google Cloud and Microsoft-backed services, at a base price of $5 per million input tokens and $25 per million output tokens.
As regulators in the United States, European Union and elsewhere move to classify and oversee “frontier” AI systems, capabilities like Opus 4.6’s zero-day discovery are likely to feature prominently in risk assessments. National cybersecurity agencies have repeatedly warned that advanced AI could be used to target critical infrastructure, while also expressing interest in AI-driven tools to secure government and industry systems.
Anthropic, which markets itself as a safety-first company with a “responsible scaling policy,” has argued that disclosing these capabilities is necessary for informed policy and coordination. But the practical consequences are now moving from theory to code: in quiet coordination channels with maintainers, hundreds of flaws identified by an AI are being patched out of the software that runs across the internet.
The same model is already being rolled out as a general-purpose assistant to developers and enterprises. Its debut suggests that the question facing the security community is no longer whether AI can find serious bugs, but whether the institutions built to manage vulnerabilities can adapt to machines that discover them at industrial scale.