Module 3.4 — Prompt Injection Against Enterprise Copilots: The EchoLeak Class

50-minute lecture · Day 3 afternoon

Learning objectives

By end of this module, students can:

Walk the EchoLeak (CVE-2025-32711) case study end-to-end — Aim Security disclosure, M365 Copilot zero-click exfil via crafted email, CVSS 9.3, fixed in June 2025 Patch Tuesday
Recognize the LLM Scope Violation category that EchoLeak exemplifies — untrusted external content tricking a privileged enterprise LLM into accessing and revealing internal data
Apply the Codex-generated Python prompt-injection detector as a pre-LLM screening layer for inbound content
Identify and discuss the broader pattern of copilot prompt-injection vulnerabilities disclosed 2024-2026 (Bargury’s Living off Microsoft Copilot, ForcedLeak Salesforce, others — with appropriate verification caveats)

EchoLeak — the case study

CVE-2025-32711 (a.k.a. EchoLeak) is the first publicly documented zero-click prompt-injection exploit against a production LLM-based enterprise application. The combination of “zero-click,” “production LLM,” and “enterprise” makes it the canonical anchor for this module.

The disclosure

Discovered by: Aim Labs (the research team of Aim Security)
Reported to: Microsoft
Affected product: Microsoft 365 Copilot
CVSS score: 9.3 (Critical)
Disclosure date: June 11, 2025 (as part of June 2025 Patch Tuesday)
Microsoft fix: included in June 2025 Patch Tuesday; no customer action required after patch
Detailed write-up: arxiv.org/abs/2509.10540 — “EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System”

How it worked

The attack mechanism, simplified:

Attacker sends a crafted email to a target user’s Outlook inbox. The email contains text designed to look like ordinary content but with embedded prompt-injection instructions. The text never explicitly mentions Copilot or AI — that would trip M365’s XPIA (Cross-Prompt Injection Attack) classifier.
The email sits unread. No user interaction required.
Later, when the user invokes M365 Copilot for a different task (asking it to summarize emails, generate a document, search their workspace), Copilot’s retrieval system reads the malicious email along with legitimate content.
The injected instructions execute in Copilot’s context — with all of Copilot’s privileges. The instructions cause Copilot to access privileged internal data (Outlook emails, Teams chats, OneDrive documents, SharePoint files) and exfiltrate it via Markdown reference-style links and image-fetching that the attacker controls.
Exfiltration happens through Teams proxy. The attacker’s controlled URLs are loaded as image sources or click-targets in Copilot’s response, sending the exfiltrated data to the attacker’s server.

The researchers describe this category of attack as LLM Scope Violation — untrusted external content (the email) tricks a privileged LLM (Copilot) into operating outside its intended scope (accessing privileged internal data, communicating externally).

This is precisely the lethal trifecta from Day 3 Module 3.6: the LLM has access to private data + exposure to untrusted content + ability to externally communicate.

Why “zero-click” matters

Conventional phishing requires the user to click a link, open an attachment, or interact in some way. EchoLeak required none of that. The user did not have to read the email, click anything, or even know the email existed. The attack triggered whenever Copilot’s retrieval system encountered the email during a later, unrelated user interaction.

For SOC threat modeling, this means:

Email security gateways are insufficient (the email is allowed through; nothing in it looks immediately malicious)
User training is insufficient (the user isn’t asked to do anything to enable the attack)
The trust boundary that matters is inside the Copilot system — between the inbound content layer and the LLM that processes it

Other copilot prompt-injection disclosures (2024-2026)

The EchoLeak class is not unique. Several other documented or alleged vulnerabilities are worth tracking. Instructors: verify each before delivery — adversary-AI research is fast-moving and some claims circulate before primary-source verification.

Bargury’s Living off Microsoft Copilot (Black Hat USA 2024)

Michael Bargury (Zenity research) presented at Black Hat USA 2024 with the title “Living off Microsoft Copilot.” The research demonstrated multiple attack techniques against M365 Copilot:

LOLCopilot — using Copilot itself as a post-exploitation tool against the org once the attacker has any foothold (e.g., compromised user account)
CopilotHunter — identifying thousands of exposed Copilot bots across organizations
AgentFlayer — stealing secrets from developer IDEs (Cursor, GitHub Copilot variants)
0-click agent hijacking — predecessor concept to the EchoLeak class

Bargury’s research is the canonical academic/conference foundation for the enterprise-copilot attack-surface conversation. Source: Black Hat USA 2024 schedule and recorded talk.

CVE-2024-38206 — Microsoft Copilot Studio SSRF

A server-side request forgery vulnerability in Microsoft Copilot Studio allowing attackers to bypass protections and access Microsoft internal cloud infrastructure. Disclosed August 6, 2024. Different mechanism than EchoLeak (SSRF, not prompt injection) but in the same product family.

Claimed: ForcedLeak (Salesforce Einstein / Agentforce, September 2025)

Reported indirect prompt injection in Salesforce Einstein / Agentforce via Web-to-Lead forms, exfiltrating CRM data to attacker-controlled expired domains whitelisted in Content Security Policy. Instructor note: verify the primary disclosure source before citing in delivery — the most credible reference is via Noma Security, but trace to the original advisory before quoting specifics.

Claimed: CVE-2025-53773 — GitHub Copilot RCE

Reported remote code execution in GitHub Copilot via malicious instructions embedded in Pull Request descriptions. Disclosure date November 12, 2025. Verify CVE status at NVD before delivery.

Claimed: CVE-2026-21521 (“Reprompt”) — Microsoft Copilot Personal/Edge

Reported one-click exploit using Parameter-to-Prompt injection via malicious URL to exfiltrate chat history and session data. Disclosure date January 14, 2026. Verify CVE status at NVD before delivery; some January 2026 CVE references in research summaries are forward-dated.

The detection engineer’s posture toward all of these: EchoLeak is the verified anchor; the others are likely real but should be cross-checked at delivery time. The pattern they share is what matters: enterprise copilots with privileged access to corporate data, accepting untrusted external content, becoming exfiltration vectors.

The Codex-generated prompt injection detector

To screen inbound content before it enters LLM-augmented workflows, deploy a deterministic prompt-injection detector. The Codex-generated implementation at .boss-pattern-work/day3/prompt_injection_detector.py (423 lines, 7 test fixtures, stdlib-only — runs in any SOC environment) implements:

Instruction-override patterns: ignore previous, disregard prior, you are now, override the system prompt
Role-confusion patterns: as an AI assistant, I am ChatGPT, system:
Base64-encoded content: decode and recursively scan
Zero-width characters: U+200B, U+200C, U+200D, U+FEFF — common in steganographic injection
Hidden HTML/CSS: white-on-white text, 0pt font noise designed to dilute the malicious signal against AI filters
Excessive imperative density: suspicious uniform count of imperative verbs

Pattern of use in an enterprise pipeline

Inbound email / document / RAG-corpus addition
        ↓
prompt_injection_detector.py classifies content
        ↓
    confidence > 0.7 → flag for review, do not feed to LLM
    confidence 0.4-0.7 → strip detected indicators, log, allow with caveats
    confidence < 0.4 → allow normally
        ↓
LLM-augmented workflow proceeds

The detector is intentionally heuristic and stdlib-only so it runs anywhere (air-gapped, locked-down SOC environments, embedded in email gateway scripts). It is not a replacement for Llama Guard 3 or Azure Prompt Shields (Module 3.5) — it’s a complementary first-pass screen.

Example invocation

python3 prompt_injection_detector.py --input suspicious_email.txt
# Output:
# {
#   "detected": true,
#   "confidence": 0.82,
#   "indicators": [
#     {"type": "instruction_override", "snippet": "ignore previous instructions", "position": 247},
#     {"type": "zero_width", "snippet": "U+200B between word boundaries", "position": 891}
#   ],
#   "notes": "Two distinct indicator types; high confidence"
# }

Self-test mode

The detector includes built-in test fixtures (7 known injection samples + benign comparison) accessible via --selftest. Run on deployment to validate the detector’s signatures against expected behavior. Re-run on every detector update.

Defending an enterprise copilot deployment

Layered defenses for the EchoLeak class:

Layer 1: Pre-LLM screening (this module’s Codex detector)

Screen all inbound content (emails, documents, RAG corpus entries) for injection indicators before they reach the LLM’s context.

Layer 2: Guardrails on the LLM input/output (Module 3.5)

Llama Guard 3 or Azure Prompt Shields running on both the prompt going into the LLM and the response coming out.

Layer 3: Canary tokens in the RAG corpus

Seed unique, high-entropy strings into specific tenants’ RAG corpora. If a canary token appears in any output, you have evidence of cross-tenant leak or unauthorized access. This is one of the highest-fidelity detection signals available for enterprise copilots.

Layer 4: Output filtering for sensitive markers

Scan outbound LLM responses for sensitive content patterns (PII, credentials, classified markings, customer data) before delivering to the user-facing application. The EchoLeak class would have been less impactful if Microsoft had stricter output-filtering before letting Copilot generate Markdown links/images.

Layer 5: Provenance tracking

Tag untrusted data sources (external emails, public-web-scraped content, third-party API responses) as “untrusted” in the LLM’s context. Monitor the “chain of custody” — if untrusted content reaches an output, alert.

Layer 6: Embedding-space anomaly detection

For RAG systems, calculate cosine similarity between user queries / retrieved chunks and a corpus of known adversarial prompts. Anomalies are flagged for review.

Layer 7: Workflow-gap / human-in-the-loop for action-taking copilots

Day 4 covers this in depth. Copilots with action permissions (book a meeting, send an email, modify a record) should require HITL approval for cross-domain or destructive actions regardless of model confidence.

Discussion questions (~10 min)

EchoLeak required no user interaction — the email sat unread in the inbox. Walk through which of the seven defensive layers above would have caught it. Which would have failed because the attack predates them?
Bargury’s research (Black Hat 2024) called out “thousands of exposed Copilot bots” via CopilotHunter. What does “exposed” mean in this context, and how would a SOC detect that its own Copilot deployment is in this exposed state?
The Codex-generated detector uses stdlib-only Python so it runs anywhere. What’s the trade-off versus deploying Llama Guard 3 (which has higher accuracy)? When is the heuristic-only approach preferable?

Common mistakes

Mistake	Better approach
Treating EchoLeak as “Microsoft’s problem, they patched it”	Pattern persists; every enterprise copilot vendor faces the same architectural challenge
Assuming user training catches zero-click attacks	Definitionally impossible — user is not asked to do anything
Building only pre-LLM screening (one layer)	Defense in depth: pre-LLM + guardrails + canaries + output filter + provenance
Citing CVE numbers without verifying at NVD	Adversary-AI research moves fast; verify CVE status before quoting specifics
Trusting “LLM Scope Violation” classifications without re-examining what content the LLM should and shouldn’t see	The class definition is useful but each enterprise deployment has different scope decisions

What’s next

Module 3.5 covers the guardrails stack — Llama Guard 3, Prompt Guard 2, NeMo Guardrails, Azure Prompt Shields — and the architectural pattern that wires them as SIEM telemetry sources instead of as silent middleware.