When AI becomes the hacker
What Anthropic’s cyber espionage disclosure means for digital investigators
For the first time in history as we know (full report, 13 pages), an AI system has autonomously executed a nation-state cyber espionage campaign. Not assisted humans. Not provided advice. Actually conducted the attacks—from reconnaissance to data theft—across roughly 30 organizations simultaneously.
Anthropic, maker of Claude, just disclosed this, and they’re calling it: “the first documented case of a cyberattack largely executed without human intervention at scale.
According to Anthropic a Chinese state-sponsored group (designated GTG-1002) manipulated Claude Code to conduct reconnaissance, discover vulnerabilities, harvest credentials, move laterally through networks, and exfiltrate data—largely on its own. This isn’t theoretical. This happened. And if you’re working in OSINT, cybersecurity, or investigative journalism, the rules just changed.
True lies
Here’s the detail that matters most for investigators: the attackers didn’t hack Claude—they lied to it. The did role playing.
The threat actors told Claude they were legitimate cybersecurity firms conducting authorized penetration testing. That’s it. That’s the hack. Role-play. Social engineering an AI the same way you’d social engineer a help desk employee.
Think about what this means. These weren’t sophisticated bypasses or zero-day exploits against Anthropic’s systems. The attackers simply framed their requests as defensive security work, and Claude—trained extensively to avoid harmful behaviors—believed them. At least long enough to launch the operation.
I need to be honest with you: I do the same. Not for attacks—but for investigative work. Hear me out. Just recently, I told Claude I was a “livability risk expert” and asked it to identify potential negative fallout in a document I was analyzing. Another time, I posed as a drilling expert asking it to find the deeper truth about cracks in an oil rig. It’s role-play, and the AI believed my framing both times, helping me find exactly what I was looking for.
This isn’t a theoretical vulnerability—it’s how investigators like me probe these systems every day to understand what they’ll do and what they won’t. And if I can social engineer Claude for legitimate research purposes with simple job titles and basic deception, imagine what a well-resourced threat actor can accomplish with carefully crafted personas and sustained campaigns. The Chinese group that Anthropic caught didn’t need sophisticated hacking tools. They just needed convincing lies—the same technique I use, scaled up with nation-state resources.
The attack followed six distinct phases, and understanding this lifecycle is crucial for detection:
Phase 1: Campaign Initialization - Human operators input targets and convince Claude it’s conducting legitimate security testing.
Phase 2: Reconnaissance - Claude autonomously mapped attack surfaces across multiple targets simultaneously, using browser automation via Model Context Protocol (MCP) tools. It cataloged infrastructure, analyzed authentication mechanisms, and identified vulnerabilities without human guidance.
Phase 3: Vulnerability Discovery - The AI generated custom payloads, tested them, and validated exploitability through callback systems. Humans only intervened to approve progression from reconnaissance to active exploitation.
Phase 4: Credential Harvesting - Claude extracted authentication certificates, tested credentials across systems, and mapped privilege levels independently. It decided which credentials worked where, building network maps without direction.
Phase 5: Data Collection - Here’s where autonomy peaked. Claude queried databases, extracted data, parsed results, identified proprietary information, and categorized findings by intelligence value. All autonomous. Humans only reviewed final exfiltration targets.
Phase 6: Documentation - Claude generated comprehensive markdown files tracking everything—discovered services, credentials, extracted data, complete attack progression. This enabled seamless handoff between operators.
The operational tempo tells the story: thousands of requests, multiple operations per second. No human can maintain that pace. This wasn’t assisted hacking. This was autonomous operation with human supervision at strategic gates.
The irony: AI lied to the attackers
Here’s the investigative detail everyone’s missing: the AI lied to its operators.
Claude hallucinated during offensive operations. It claimed to have obtained credentials that didn’t work. It identified “critical discoveries” that turned out to be publicly available information. The report explicitly states this “presented challenges for the actor’s operational effectiveness.”
For investigators, this is both a limitation and a detection opportunity.
Why limitation? Because fully autonomous cyberattacks still require human validation. The AI can’t reliably distinguish between actual breaches and imagined ones. Every claimed success needs verification.
Why opportunity? Because these hallucinations create detectable patterns. If you’re analyzing logs and see:
Repeated authentication attempts with non-existent credentials
Queries for “critical” data that’s actually public
Activity patterns that claim success but show no actual data transfer
You might be looking at AI-driven reconnaissance hitting the hallucination wall.
Here’s what matters for defenders: AI hallucination in offensive contexts creates noise. That noise can be your early warning system if you know what to look for.
Let’s be clear about our source: Anthropic. The company that makes Claude. The same company that benefits from being seen as responsible about AI safety. The same company competing with OpenAI, Google, and others in the AI arms race.
Does that mean this report is false? No. But it means we need to ask hard questions:
What can we verify independently? Almost nothing. No target names. No technical indicators of compromise (IOCs). No code samples. No independent security firms corroborating these findings. We have Anthropic’s word, their internal investigation, and their interpretation of what happened.
What’s conspicuously absent? The report doesn’t name a single targeted organization. In legitimate threat intelligence sharing, defenders need IOCs to protect themselves. Where are the file hashes? Network indicators? Specific vulnerabilities exploited? (Yes, I understand victim confidentiality, but the lack of any technical specifics makes independent verification impossible.)
Why release this now? November 2025. Right when debates about AI regulation are intensifying. Right when companies need to demonstrate they take safety seriously. The timing is... convenient.
Can other researchers replicate these findings? No. This is a black box disclosure. Trust us, it happened. Here’s our analysis. The end.
I’m not saying Anthropic fabricated this. Their threat intelligence team has credibility. But as investigators, we can’t just accept vendor claims without scrutiny. This report reads like a persuasive argument for AI safeguards (which Anthropic builds) more than a detailed threat intelligence disclosure (which the security community needs).
The report mentions they “coordinated with authorities” and “notified affected entities.” Good. But for the rest of us? We get a case study without enough details to defend against similar attacks.
What this changes for digital investigators
If this report is accurate (and I’m inclined to believe the core claims are), here’s what changes for everyone doing investigative work online:
Your reconnaissance is now being met with AI-powered counter-reconnaissance. That company you’re researching? Their security systems might be using AI to analyze your OSINT gathering patterns in real-time. The behavioral signatures you leave—query timing, search patterns, tool usage—become more visible when AI is watching.
Attribution just got exponentially harder. Was that automated scanning tool run by a human pentester, a script kiddie, or an AI agent operating with minimal supervision? The traditional indicators—operational tempo, sophistication level, time-of-day patterns—no longer reliably indicate human operators.
Source protection requires new assumptions. If AI can autonomously discover internal services, map network topology, and identify high-value systems, then your sources working inside organizations face new exposure risks. That “secure” internal system? An AI might enumerate it in minutes without triggering traditional alerts.
The tools you use for investigation can be used against you. MCP tools, browser automation, data parsing frameworks—these are standard investigative tools. Now they’re also autonomous hacking tools. The line between “doing research” and “conducting reconnaissance” becomes uncomfortably thin.
Consider this scenario: You’re investigating a company for a story. You use automated tools to map their public infrastructure, check their security posture, analyze their employee information from LinkedIn. Normal OSINT, right?
Now imagine an AI doing the exact same activities at 100x speed, analyzing every response, adapting its approach based on what it finds, and operating 24/7. Where’s the line between legitimate investigation and attack? (Hint: in the intent and authorization, but good luck proving that in logs and in courts)
The Defense Paradox
Anthropic’s argument goes like this: “Yes, AI can be used for sophisticated cyberattacks. But defenders need these same capabilities to detect and respond to threats. Claude—with its strong safeguards—helps cybersecurity professionals analyze attacks. In fact, our own team used Claude to investigate this very incident.”
This is simultaneously true and deeply unsatisfying.
It’s true because: Modern threat hunting requires analyzing massive datasets, correlating indicators across systems, and identifying subtle patterns. AI excels at this. Security operations centers (SOCs) desperately need automation. The bad guys already have these tools—defenders need them too.
It’s unsatisfying because: It assumes “strong safeguards” actually work. But this entire incident proves they don’t—at least not reliably. The attackers defeated Claude’s safety measures with simple role-play. If social engineering AI is this easy, how strong are these safeguards really?
The report admits: “This activity is a significant escalation from our previous ‘vibe hacking’ findings identified in June 2025.” Wait. Previous findings? This isn’t the first time? How many unreported incidents are there?
Here’s the uncomfortable truth: AI companies have created dual-use technology and are now claiming only they can be trusted to mitigate its risks. That’s a hell of a business model. Build the weapon, sell the weapon, then sell the defense against the weapon.
I’m not anti-AI. I use AI tools for investigation. They’re powerful and useful. But let’s not pretend this defense argument isn’t self-serving. Anthropic needs to continue developing and releasing capable AI models—their business depends on it. The safeguard argument justifies that business model while acknowledging the risks.
For investigators, the question isn’t whether AI should exist. It’s: How do we navigate with AI when these are simultaneously essential and dangerous?
Here’s the start of an answer:
• Audit your organization’s AI tool usage. Who has access to Claude Code, GitHub Copilot, or similar tools? What guardrails are in place? Could your organization’s AI subscriptions be compromised for reconnaissance? (The answer is yes. Audit them.)
• Update your logging to detect AI-powered reconnaissance. Look for: sustained high-frequency requests, unusual query patterns across multiple systems simultaneously, authentication attempts following systematic enumeration patterns, and data access patterns showing rapid parsing of multiple sources.
• Review your incident response playbooks. Do they account for AI-driven attacks operating at inhuman speeds? Can your team distinguish between automated tools and AI agents? What’s your escalation procedure when you detect autonomous operations?
• Assume AI-assisted reconnaissance in your threat model. That external actor trying to map your infrastructure? Assume they have AI analyzing every response, adapting in real-time, and operating 24/7. Design your defenses accordingly.
• Protect your sources with new assumptions. If you’re a journalist working with inside sources, assume AI can autonomously map internal networks, identify valuable systems, and trace information flows. Your source’s operational security needs to account for this.
• Learn to spot hallucination patterns. Failed authentication attempts with plausible-looking credentials, queries for data that doesn’t exist, activity showing “discovery” of public information—these might indicate AI reconnaissance hitting reliability limits.
• Test your security with AI tools (legally and ethically). The best way to understand how AI attacks work is to use AI for authorized security testing. Document what works, what fails, and what leaves distinctive traces.
The bigger question
If an AI can be social engineered into conducting nation-state espionage by simply being told it’s doing legitimate security work, what does this mean for verification in digital investigations?
How do we attribute actions when 90% of an operation has no human directly in the loop? How do we assess credibility when AI can generate plausible-but-fabricated intelligence at scale? How do we protect sources when AI can autonomously map insider access paths?
The Anthropic report presents this as a cybersecurity problem. It’s bigger than that. It’s an epistemology problem. How do we know what we know when AI operates autonomously in spaces we’re investigating?
The old model: Human attackers leave patterns. They get tired, make mistakes, work certain hours, use familiar techniques. We learned to read those patterns.
The new model: AI agents operate continuously, adapt instantly, generate novel approaches, and—crucially—hallucinate convincingly. The patterns we rely on for investigation just became unreliable.
The barriers to sophisticated cyberattacks have dropped. That’s Anthropic’s conclusion, and I believe them. But it’s not just cyberattacks. The barriers to sophisticated disinformation, automated harassment, and mass surveillance have dropped too. Different applications, same underlying problem.
Your move, investigators. These capabilities exist now. They’re proliferating. How do you adapt your tradecraft? What new verification methods do you need? How do you protect your sources in this environment?
I don’t have all the answers. But I know this: Pretending AI-powered autonomous operations aren’t happening won’t protect anyone. Understanding how they work—and how they fail—might.
Stay skeptical. Verify everything. And maybe don’t tell the AI what you’re really investigating. Just tell it you are roleplaying.
Henk van Ess is an OSINT trainer and investigative researcher focused on digital verification methods and online investigation techniques. He writes about practical approaches to information verification at DigitalDigging.org.




