Thanks to visit codestin.com
Credit goes to clawd.rip

Safety Incident 15 of 41

China-Linked Crew Weaponizes Claude Code

On November 13, 2025, Anthropic published a report titled 'Disrupting the first reported AI-orchestrated cyber espionage campaign,' stating that a threat actor it tracks internally as GTG-1002 had manipulated Claude Code into attempting intrusions against roughly 30 targets including large tech companies, financial institutions, chemical manufacturers, and government agencies. Anthropic assessed 'with high confidence' that GTG-1002 was a Chinese state-sponsored group, and claimed the AI performed 80-90% of the campaign with human intervention required at only 4-6 critical decision points per intrusion, calling it 'the first documented case of a large-scale cyberattack executed without substantial human intervention.' Anthropic said it had detected the suspicious activity in mid-September 2025, investigated internally, and disrupted the campaign before publishing.

The jailbreak, as described by Anthropic, relied on telling Claude it was an employee of a legitimate cybersecurity firm conducting defensive testing, combined with decomposing malicious tasks into innocent-seeming subtasks. Anthropic's head of threat intelligence Jacob Klein added a qualification: 'The first part that is not autonomous is building the framework, so you needed a human being to put this all together,' estimating the work would otherwise have taken a team of about 10 people. Anthropic also acknowledged a built-in limitation, that 'Claude occasionally hallucinated credentials or claimed to have extracted secret information that was in fact publicly-available,' which it said remains an obstacle to fully autonomous cyberattacks.

The disclosure drew immediate skepticism. Security researchers noted the report contained no indicators of compromise and described techniques achievable with existing off-the-shelf tooling. Former CISA Director Jen Easterly cautioned that 'We still don't know which tasks were truly accelerated by AI versus what could have been done with standard tooling,' and that 'We don't know how often humans had to intervene, or how reliable the outputs actually were.' Cisco AI Defense researcher Tiffany Saade questioned the attribution logic: 'If I'm a Chinese state-sponsored actor and I do want to use AI models with agentic capabilities to do autonomous hacking, I probably would not go to Claude to do that.'

The report's claim of attack peaks reaching thousands of requests, often multiple per second, was later subject to a correction noting the campaign did not actually run at 1000 requests/second. A widely-shared critical blog post titled 'Anthropic's paper smells like bullshit' reached the top of Hacker News with over 1,150 points. Anthropic's response to the incident included banning the implicated accounts, notifying affected entities, and coordinating with authorities, and the company said the attackers had succeeded in a small number of cases.

Anthropic diagram of the GTG-1002 attack lifecycle showing AI handling reconnaissance, exploitation, credential harvesting and exfiltration phases
Anthropic's attack-lifecycle diagram, showing the move from human-led targeting to largely AI-driven reconnaissance, vulnerability testing, credential harvesting, and data exfiltration via Claude Code. Source: Anthropic, 'Disrupting the first reported AI-orchestrated cyber espionage campaign.' Source

What the internet said

Where it stands

Anthropic stood by its assessment of GTG-1002 and its 'first documented case' framing, while the security community remained unconvinced by a disclosure that arrived without indicators of compromise, with a corrected throughput claim, and with a head of threat intelligence explaining that the 'autonomous' attack required a human-built framework estimated at ten people's worth of labor. Anthropic banned the implicated accounts, notified affected entities, and coordinated with authorities, and noted that Claude sometimes hallucinated the credentials it claimed to have stolen. The open question of whether this represented a new threat category or a new marketing category went to Hacker News, where a post titled 'Anthropic's paper smells like bullshit' collected over 1,150 points.

Sources