AI Global

Anthropic raises concerns over human oversight of AI as self-improvement capabilities advance

by

Teddy Cambosa

-

2 months ago

Our What’s NEXT in Marketing 2026 Conference Series returns to the Philippines, Singapore, Hong Kong, Indonesia, Malaysia—and for the first time in Thailand! Brace yourself for bold ideas driving the next marketing wave. Click here to learn more!

United States – Anthropic has published a report outlining the progress its AI systems have made toward what the industry terms recursive self-improvement — the point at which an AI could autonomously design and train its own successor — and flagging the governance challenges that such a development would pose.

The Anthropic Institute, the company’s research and policy arm, draws on both external benchmark data and internal operational figures to document how AI systems are already taking on a growing share of AI development work.

While the report notes the potential benefits in science and healthcare, it also identifies risks that the company says existing institutions may not yet be equipped to manage.

What the data shows

The report’s concerns are grounded in internal figures that Anthropic describes as previously unreported. As of May 2026, more than 80 percent of code merged into the company’s production systems was authored by Claude. Engineers are shipping eight times as much code per quarter as in the 2021–2024 period.

On fully open-ended engineering problems — those where the engineer has not specified what a solution should look like — Claude’s success rate reached 76 percent in May 2026, up 50 percentage points over six months.

In research, the company describes a demonstration from April 2026 in which Claude-powered agents were given an open-ended AI safety question and left to work through it independently. Two human researchers, working for about a week, recovered roughly 23 percent of the measurable performance gap on the problem; the agents recovered 97 percent over 800 cumulative hours.

Anthropic notes that humans still chose the problem and set the scoring criteria, and that the result did not transfer cleanly to production-scale models — but observes that designing every experiment was left entirely to the agents.

Three scenarios, two of them pressing

Anthropic frames its analysis around three possible futures, stating it considers the first the least likely and is more concerned about the remaining two.

Capability growth plateaus – Progress slows due to architectural limits or compute constraints. Today’s AI remains transformative, but institutions have more time to adapt. Anthropic considers this the least likely outcome based on current trends.
Compounding efficiency gains continue – AI development becomes substantially automated while humans retain strategic direction. Organisations become significantly more efficient, but the same capabilities could also be applied to surveillance or large-scale influence operations.
Full recursive self-improvement – AI systems begin designing their own successors. Development pace becomes determined by available compute. Humans shift primarily to oversight and validation. The report describes this as the scenario it is “least certain about” in terms of alignment outcomes.

The coordination problem

Anthropic’s report does not advocate for unilateral action by any single company. It argues that if slowing frontier AI development were practically achievable, doing so would likely be beneficial — but that a pause by one lab would shift who leads rather than create the broader deliberative process the situation calls for.

A credible slowdown, the report says, would require multiple well-resourced laboratories across multiple countries to agree to stop, with each able to verify that others had genuinely done so. The company describes this verification problem as significantly harder than comparable arms-control challenges: AI training runs are far easier to conceal than missile silos, the inputs are general-purpose, and the incentive to continue quietly is substantial.

Anthropic says it would participate in a verified, multi-party slowdown if other frontier developers agreed to the same conditions, and intends to convene policymakers, researchers, civil society organisations, and other AI companies in the coming months to discuss what workable coordination mechanisms might look like. It has committed to publishing the outcomes of those discussions.