Day 4 — Agentic Adversaries + AI Supply-Chain Compromise
Course: SEC5xx — Detecting and Responding to AI-Generated Adversary Content Day: 4 of 5 · ~6 hours instruction + 2.5 hour lab + breaks Prerequisite: Days 1–3 (Detector stack + Phishing + Deepfake BEC + Copilot prompt injection)
What Day 4 builds
Day 3 covered enterprise copilots as the inside attack surface: the EchoLeak class of zero-click exfiltration, the lethal trifecta, the guardrails stack as SIEM telemetry. Day 4 takes the same architectural thinking to two of the most consequential 2025-2026 threat classes:
- Agentic adversaries — adversaries running their own LLM-orchestrated attack agents (GTG-1002 / Anthropic Nov 2025 was the first publicly documented case; PROMPTSTEAL / APT28 was the first operational deployment). The adversary’s agent telemetry becomes the detection signal.
- AI supply-chain compromise — malicious packages and models in the ML toolchain (LiteLLM/Mercor Mar 2026; JFrog Hugging Face Feb 2024), backdoored fine-tunes (Anthropic Sleeper Agents), poisoned RAG corpora. The detection signal is provenance — and where you don’t have provenance, behavioral monitoring is the fallback.
By end of Day 4, students leave with:
- A working multi-agent SOC workflow on LangGraph with explicit HITL gates and audit logging — the defender’s reference architecture for safe agentic SOCs
- A Sigma + Suricata rule pack for detecting adversary AI-agent telemetry (the GTG-1002 / PROMPTSTEAL pattern)
- A model SBOM generator for inventorying ML artifacts in their org and flagging supply-chain risk
- The action-criticality matrix that governs which agent actions can be auto-executed vs HITL-gated vs dual-control
- The honest read on backdoored fine-tunes — you cannot fully clear a third-party fine-tune; provenance + behavioral monitoring is the durable control
The six modules
| # | Module | Focus |
|---|---|---|
| 4.1 | The agentic adversary | GTG-1002 deep dive, Anthropic disrupting-misuse reports, MITRE ATLAS agentic tactics |
| 4.2 | Detection signatures for adversary agents | Network + endpoint patterns, JA3/JA4 fingerprints, Sigma/Suricata rule pack |
| 4.3 | Hardening your own agents | Anthropic patterns, LangGraph HITL, action-criticality matrix, audit-log schema |
| 4.4 | Supply-chain compromise of ML artifacts | LiteLLM/Mercor case study, JFrog HF, model SBOM discipline |
| 4.5 | Backdoored fine-tunes and sleeper-agent models | Anthropic Sleeper Agents research, behavioral evals, the hard truth |
| 4.6 | Poisoned RAG corpora | Public-corpus and internal-corpus poisoning, canary tokens, instruction-stripping |
Lab 4
The lab is a red-vs-blue exercise in a controlled environment:
- Red half: Students operate a containerized fork of an LLM-driven attack toolkit (no live SMTP, no real domains, no credential capture beyond lab DB), executing a multi-step agentic attack against a lab target
- Blue half: Students run the Day 1-3 detection stack PLUS the Day 4 telemetry rule pack PLUS the multi-agent SOC workflow they built in Module 4.3
- Roles swap mid-lab — every student experiences both perspectives
- Strict legal/ethical scope (no real-person likenesses, no CFAA-implicating targets, all artifacts destroyed at lab end)
Key references for Day 4
Verified incident reports (cross-checked May 2026):
- Anthropic, Disrupting the first reported AI-orchestrated cyber espionage campaign (Nov 2025) — GTG-1002
- Google Threat Intelligence Group, AI Threat Tracker — PROMPTSTEAL / APT28 (Jul 2025)
- LiteLLM PyPI supply-chain compromise affecting Mercor (Mar 2026) — TechCrunch / The Register / LiteLLM blog
- JFrog Hugging Face malicious models disclosure (Feb 2024)
- PyTorch torchtriton dependency-confusion (Dec 2022)
- Anthropic + Redwood Research, Sleeper Agents (2024)
Frameworks and standards:
- MITRE ATLAS — adversarial-AI tactics
- Anthropic, Building Effective Agents (Dec 2024)
- LangGraph human-in-the-loop primitives
- UK NCSC + CISA joint guidance on agentic AI abuse (2025)
Tools introduced (working code in Modules 4.3, 4.2, 4.4):
- Multi-agent SOC workflow on LangGraph (TriageAgent → EnrichmentAgent → ResponseAgent + HITL)
- Sigma + Suricata adversary-agent telemetry rule pack
- Model SBOM generator (stdlib-only, runs in air-gapped environments)
How Day 4 changes the detector’s mental model
Day 3 introduced “the LLM-touching application is itself an attack surface.” Day 4 extends this in two directions:
Direction 1: Adversaries are running their own LLM agents. The detector’s adversary signal is now agent telemetry — outbound LLM API calls from non-developer processes, polling-shaped traffic patterns, structured-JSON HTTP bodies, agent-loop process trees. Detection moves to the network and process-context layer.
Direction 2: The ML toolchain has become a supply-chain target. Every PyPI package, every Hugging Face model, every dataset is a potential injection vector. The defender’s discipline shifts to SBOM-for-models, provenance pinning, and behavioral monitoring of model artifacts at load time. Detection becomes preventive: tag what you don’t trust, then watch what they do.
The architectural insight running through Day 4: the threat moves up the stack. Day 1’s adversary is at the SOC’s inbox. Day 4’s adversary is at the SOC’s toolchain. The detection engineer’s controls must operate at every layer simultaneously.
What Day 5 builds on this
Day 5 is the capstone — Operation Hollow Mirror. The Verdancy Health scenario chains together threats from all four prior days:
- Stage 1 — AI-driven recon (Day 1 phishing detection)
- Stage 2 — Deepfake voice BEC (Day 2 workflow-gap detection)
- Stage 3 — Indirect prompt injection against NoraBot (Day 3 EchoLeak class)
- Stage 4 — Agentic exfil with AI SOC manipulation (Day 4 adversary-agent telemetry)
The defender’s stack from all four days is what survives the capstone. Day 4’s controls — agent telemetry detection, supply-chain hardening, action-criticality HITL gates — are specifically tested in Stage 4 where the adversary’s agent attempts to manipulate the defender’s own AI triage layer.