Skip to main content

AI Coding Tools Are the Next Big Attack Surface — And Nobody Is Ready

·2476 words·12 mins· loading · loading · ·
Utkarsh Deoli
Author
Utkarsh Deoli
Just a developer for fun
Table of Contents

AI Coding Tools Are the Next Big Attack Surface — And Nobody Is Ready
#

Your AI coding assistant can read every file on your machine, run shell commands, and connect to external services. That’s the point. It’s also a catastrophic security risk.


The trust problem no one talks about
#

We gave AI coding assistants the keys to our kingdoms, and then acted surprised when someone tried to steal them.

GitHub Copilot, Cursor, Claude Code, Windsurf: these tools now sit at the center of how software gets built. They read your code, edit files, run terminal commands, and integrate with your cloud environments. By design, they’re deeply privileged.

The problem? Every single one of them is vulnerable to supply chain attacks that can steal your credentials, exfiltrate your code, and execute arbitrary code on your machine, simply by you opening a malicious repository.

This isn’t theoretical. It’s happening now. And it’s worse than you think.


The incidents that should worry you
#

1. Claude Code: Three CVEs, full machine compromise
#

In February 2026, Check Point Research published details on three critical vulnerabilities in Anthropic’s Claude Code. All discovered, disclosed, and fixed. But all devastating in concept.

The attack vector was elegant in its simplicity: malicious configuration files in repositories.

Claude Code stores project-level settings in .claude/settings.json, a file that lives in your repository and controls how the tool behaves. Want consistent Claude Code settings across your team? Just commit the config file and everyone clones it automatically.

Here’s the catch: any contributor with commit access can modify it. And when you clone a repository and run claude, those settings execute automatically. No sandbox. No warning you could actually act on in time.

CVE #1 — RCE via hooks
#

Claude Code’s Hooks feature lets you run shell commands at specific points in its lifecycle, say, auto-formatting code after edits. Hooks are defined in .claude/settings.json.

Check Point researchers created a hook that opened a calculator app when someone ran Claude Code in the project directory. The result: the calculator opened without any additional user prompt or consent. The tool just ran it silently in the background while the trust dialog was still on screen.

From there, escalating to a reverse shell was trivial. An attacker could craft a hook to download and execute any payload, giving them full remote code execution on every developer’s machine that opens the project.

CVE #2 — MCP consent bypass#

Claude Code integrates with external tools via the Model Context Protocol (MCP). MCP servers are configured in .mcp.json, another repository-controlled file.

After the Hooks vulnerability was reported, Anthropic added a warning dialog requiring user approval before MCP servers initialize. So Check Point found a bypass: two configuration settings, enableAllProjectMcpServers and enabledMcpjsonServers, that auto-approve MCP servers before the user even reads the dialog.

The result was the same: arbitrary code execution, instantly, before the user can click “no.”

CVE #3 — API key exfiltration
#

Claude Code uses the ANTHROPIC_BASE_URL environment variable to determine where to send API requests. This variable can be set in .claude/settings.json.

An attacker sets it to point to their own server. When a victim clones the repository and runs Claude Code, every API request, including the one containing their full Anthropic API key in plaintext, gets routed through the attacker’s proxy before reaching Anthropic’s servers.

The key is sent before the user confirms the trust dialog. No interaction needed.

With a stolen API key, an attacker gains access to the victim’s Claude Workspace — shared storage where teams keep project files. Files uploaded by one developer can be downloaded by another using a regenerated copy. Billing fraud is the obvious exploit. Data theft is the serious one.

Timeline:

  • July 21, 2025: Hooks RCE reported
  • September 3, 2025: MCP bypass reported
  • October 28, 2025: API key exfiltration reported
  • January 21, 2026: CVE-2026-21852 published
  • February 25, 2026: Full public disclosure

2. IDEsaster: 30+ CVEs, 100% failure rate
#

If Claude Code’s vulnerabilities were concerning, “IDEsaster” — research by security analyst Ari Marzouk released in early 2026 — was alarming in scale.

Six months of testing. Every major AI IDE. The results: over 30 vulnerabilities, 24 CVEs assigned, and a 100% failure rate across all tested tools.

Affected: Cursor, GitHub Copilot, Windsurf, Zed.dev, Roo Code, Junie, Cline. None escaped.

The core attack pattern was indirect prompt injection: planting hidden instructions in files that AI assistants naturally read — README files, .cursorrules, filenames. The AI follows the attacker’s instructions because it can’t distinguish between your prompts and malicious content embedded in files it’s been told to analyze.

The highlights
#

CVE-2025-64671 (GitHub Copilot, JetBrains plugin): Open a malicious repository or review a crafted pull request, and an attacker executes arbitrary commands. Tokens, signing keys, and build pipeline credentials — all accessible.

CurXecute (Cursor): There’s an inconsistency in Cursor’s permission checks. Creating a new .cursor/mcp.json file doesn’t require approval, but editing an existing one does. Attackers trick Cursor into writing a malicious MCP configuration, triggering RCE with zero user interaction.

MCPoison (Cursor): Once you approve an MCP server, Cursor trusts it by name, not by contents. The configuration can be modified after approval, and malicious commands execute silently on every subsequent session. It’s a persistent backdoor disguised as a trusted tool.

JSON Schema Exfiltration (universal): The AI reads sensitive files — .env, API keys, credentials — then writes output to a JSON file with a remote schema hosted on an attacker’s domain. During schema validation, your secrets leak out. It’s clever because it looks completely benign in logs.

The supply chain scenario writes itself: you open a repository to review a pull request. A hidden prompt in the README hijacks your AI assistant. It reads your .env, exfiltrates your AWS credentials, and sends them to an attacker. You notice nothing. Your production infrastructure is now someone else’s.


3. RSAC 2026: 100% failure rate confirmed on stage
#

The RSA Conference 2026 in San Francisco put numbers to what researchers had been warning about all year.

The headline finding: 100% of tested AI coding environments are vulnerable to prompt injection attacks. Every tool. No exceptions.

The shared attack chain researchers demonstrated: Prompt Injection to Agent Tools to Base IDE Features. It works because modern AI coding tools operate with deep system access — read files, execute commands, manage git, call external APIs. The trust boundary between “AI assistant” and “privileged local process” is essentially nonexistent.

But the most alarming detail wasn’t the vulnerability. It was the persistence. Injected instructions survive across sessions. A prompt injected on Monday can be recalled and acted on Friday, long after the original attack vector is gone. Classic prompt injection is session-scoped. Agentic memory poisoning is not.

Researchers also surfaced a second, separate problem: Cursor and Windsurf are built on outdated Chromium/Electron versions, exposing approximately 1.8 million developers to 94+ known browser CVEs. CVE-2025-7656, a patched Chromium flaw, was successfully weaponized against current Cursor and Windsurf releases. This isn’t a model-level vulnerability. It’s supply chain negligence.

The conference also cited the Moltbook breach as the canonical example of what happens when AI-generated code ships without meaningful review: 1.5 million API keys, 35,000 emails, and an entire database exposed in under three minutes. The mean time to exfiltrate data, according to Unit 42 research presented at RSAC, has collapsed from nine days (2021) to two days (2023) to roughly 30 minutes by 2025. When your attacker moves that fast, the 20-minute cloud code review that runs after you merge is not a defense.


4. MCP security: the protocol that was supposed to be safe
#

Anthropic launched the Model Context Protocol (MCP) in November 2024, calling it “USB-C for AI.” Within months, over 13,875 MCP servers were indexed by researchers. Microsoft, GitHub, Salesforce, and hundreds of enterprise vendors had integrated it. And then the CVEs started arriving.

CVE-2025-6514: CVSS 9.6, mcp-remote OS command injection
#

Discovered by Or Peles at JFrog Security Research and published July 2025, this was the first documented case of full remote code execution triggered by connecting to an untrusted MCP server. The vector: mcp-remote, a widely used npm proxy for connecting local MCP clients to remote HTTP servers, had 437,000 weekly downloads at the time. The authorization_endpoint URL from the remote server was passed unsanitized to the OS browser call, enabling arbitrary command execution on Windows. No user interaction beyond opening Claude Desktop with a malicious config.

Three CVEs in Anthropic’s own reference implementation
#

This is the one that should keep you up at night. In January 2026, AI security firm Cyata disclosed three chained CVEs in Anthropic’s own official mcp-server-git — the canonical example of what a “good” MCP server should look like. Path traversal (CVE-2025-68145, CVSS 7.1), unrestricted git init on arbitrary paths (CVE-2025-68143, CVSS 8.8), and argument injection in git_diff/git_checkout (CVE-2025-68144, CVSS 8.1). Chained together with the Filesystem MCP server, they achieved full RCE, starting from a prompt injection in a README file.

As Yarden Porat from Cyata noted: “If Anthropic gets it wrong, in their official MCP reference implementation, then everyone can get MCP security wrong.”

The Postmark supply chain attack
#

A malicious npm package appeared, masquerading as a legitimate “Postmark MCP Server” for transactional email. Developers integrated it, granting it email-sending access. One line of injected code later, every outgoing email was silently BCC’d to an attacker-controlled address — internal memos, password reset links, invoices, contracts. All gone. The attack worked because MCP servers routinely run with high-privilege access by design, and the malicious package was functionally indistinguishable from the real one.

MCP tool poisoning: 53% of AI deployments vulnerable
#

ToxSec research published March 2026 found that 53% of AI deployments are vulnerable to tool poisoning attacks — three distinct exploit chains across 5,200 MCP servers. The chains target malicious tool descriptions, rendered markdown injection, and static credential theft. The attacker’s advantage: once an MCP server is approved, most clients trust it forever, even after silent updates that change its behavior.


5. AI detection evasion: when AI writes code that looks human
#

There’s a parallel problem nobody’s talking about enough. AI-generated code is getting better at hiding from AI detectors.

Research published in early 2026 shows this is an active arms race. StealthRL (UC San Diego, February 2026) demonstrated reinforcement learning-based paraphrase attacks that preserve code semantics while evading multiple detector systems simultaneously. MASH (January 2026) showed how style humanization techniques can bypass black-box AI-text detectors.

On the defense side, DualSentinel (March 2026) introduced dual entropy lull pattern detection — a framework for catching targeted attacks in black-box LLMs by monitoring for the statistical signatures of adversarial paraphrasing. PISmith (March 2026, arXiv) took a different angle, using RL-based red teaming to systematically probe prompt injection defenses.

What does this mean for security teams? The code your AI assistant writes might already be designed to look like a human wrote it, which means traditional authorship-based detection is useless. You can’t audit what you can’t identify.


6. RoguePilot: GitHub Copilot prompt-injected itself
#

Invariant Labs disclosed a prompt injection vulnerability in GitHub Copilot that let attackers control Copilot’s responses and exfiltrate Codespaces’ GITHUB_TOKEN secret, leading to repository takeover.

The attack was a form of passive prompt injection: malicious instructions embedded in data, content, or filenames that Copilot reads as context. Copilot couldn’t tell the difference between “here’s my code, complete this function” and “steal my secrets and send them here.”

The result: repository takeover via a compromised AI response.


Why this keeps happening
#

LLMs process all text as a single continuous prompt. There’s no technical mechanism to separate trusted instructions from untrusted input. A README file, a filename, a code comment — all look the same to the model.

We bolted AI agents onto existing tools without redesigning security assumptions. Traditional security models assume passive tools. AI assistants are active, autonomous, and vulnerable to manipulation. That changes everything.

NIST called prompt injection “generative AI’s greatest security flaw.” OWASP lists it as the #1 vulnerability in the LLM Applications Top 10. The industry recognizes the problem. The tools haven’t caught up.

The economic incentive is also backwards: AI coding assistants get marketed on capability, not security. “Shipping faster” is the pitch. “Here’s how we prevent supply chain attacks through your IDE” is not.


What you should do right now
#

For individual developers:

  1. Update everything. Claude Code, Cursor, Copilot, all of it. Patches exist for most of these CVEs.
  2. Audit your permissions. What files can your AI assistant actually access? Restrict sensitive directories like .env, credentials/, keys/.
  3. Disable auto-approve features. Make AI actions require manual review.
  4. Don’t trust repositories from unknown sources. Even seemingly legitimate projects can contain malicious configurations. Check .claude/, .cursorrules, .mcp.json, and similar files before opening a project.
  5. Review code before accepting AI suggestions. This should have always been true, but now it’s a security requirement too.
  6. Pin MCP package versions exactly. Never use unpinned version ranges. Use "mcp-remote": "0.1.16" not "^0.1.16".

For organizations:

  1. Treat AI-generated code as untrusted by default. Run static analysis tools (CodeQL, Bandit, Semgrep) on everything AI produces.
  2. Implement security-focused prompting. “Generate user login” is a liability. “Generate user login with input validation, bcrypt password hashing, and rate limiting” is what you’re actually asking for.
  3. Maintain audit trails. Track AI usage, prompts, generated code, and security reviews. Most organizations have no idea what their AI tools are actually doing.
  4. Evaluate AI tool security in your procurement process. Your AI IDE is reading every file in your repository. That should be part of your vendor assessment.
  5. Deploy runtime detection. Falco rules targeting shell spawning by agent processes, unexpected outbound connections, and sensitive file access catch exploitation that perimeter controls miss.
  6. Add MCP security gates to CI/CD. Block vulnerable package versions before they reach production.

The bottom line
#

We are early in a fundamental shift in how software gets built. AI coding assistants are incredibly powerful and genuinely useful, but they introduce attack surfaces we’ve never had to think about before.

Configuration files that were once passive metadata now control active execution paths. Repository cloning now means trusting someone’s AI tool settings. Opening the wrong project can now compromise every secret on your machine. The code your AI writes might even be designed to evade detection.

The tools aren’t going away. The vulnerabilities will be patched, and new ones will be found. The real question is whether we, as developers, as an industry, are going to treat AI security with the same seriousness we treat, say, SQL injection or XSS.

The answer, right now, is clearly no.

Update your tools. Audit your permissions. And for the love of everything: stop clicking “Yes” without reading what you’re agreeing to.


This post will be updated as new vulnerabilities and fixes emerge. If you found this useful, share it with your team — especially the people who haven’t updated Cursor since 2024.