SEC5xx — Detecting and Responding to AI-Generated Adversary Content
Course blueprint and pitch positioning for SANS Cyber Defense Curriculum
| Field | Value |
|---|---|
| Document version | 1.0 — 2026-05-13 |
| Lead author | Ed Dulharu (vExpertAI) |
| Proposed co-author | John Hubbard (SANS Cyber Defense Curriculum Lead; author, SEC450) |
| Routed to | Lisa Peterson, Director of Cyber Defense Curriculum, SANS Institute |
| Tier | SEC5xx |
| Duration | 5 days, ~36 CPE hours |
| Format | In-person, OnDemand, Live Online |
| GIAC certification | GAIDA — GIAC AI Detection Analyst (backup: GAITA) |
| Lab platform | Browser-based on pre-provisioned EC2 |
1. Course thesis and one-sentence pitch
Pitch line:
Your SOC has AI. The adversary has AI too. This course teaches detection engineers to catch what adversary AI generates — deepfake BEC, LLM-authored phishing at scale, polymorphic malware, prompt-injection campaigns, and agentic intrusions — using a defender’s AI stack tuned for adversary-content detection.
Capstone marketing line (for slide 2 of the pitch deck):
“For eight hours, you defend Verdancy Health against PROMETHEUS-7 — an AI-orchestrated adversary that has studied your AI SOC, knows how it reasons, and built an attack designed to make your own agents lie to you.”
2. The gap this course fills (validated against SANS 2026 catalog)
SANS already has strong 500-level coverage of AI as a tool and AI as a target. There is one remaining gap: AI as the source of adversary content the SOC must detect.
| SANS course | Owns | Overlap with this course |
|---|---|---|
| SEC450 (Hubbard, 2025 AI refresh) | AI/LLM/RAG/Ollama tooling in the SOC analyst workflow | Heavy on tooling — this course assumes SEC450 as prerequisite, doesn’t re-teach it |
| SEC598 (Vandeleur/Ostrom) | AI security automation, agentic workflows for red/blue/purple teams | Adjacent — covers automation, not adversary content detection |
| SEC555 (Mitropoulos) | Detection engineering and SIEM analytics | No LLM/agent curriculum; this course extends DE into AI threats |
| SEC595 | Traditional ML applied to SecOps (anomaly detection) | Different ML era; no LLM/adversary AI content |
| SEC535 (Nethercott) | Offensive AI — red team’s AI toolkit | The mirror image; this course is defender-side of the same threat class |
| SEC545 (Abugharbia) | GenAI/LLM application security | AppSec, not SOC |
| SEC411 | AI security fundamentals (400-level) | On-ramp, not competing tier |
| This course | Detection and response for AI-generated adversary content | — |
Defensible one-sentence gap claim for the pitch deck:
“SANS teaches the SOC to use AI (SEC450), to automate with AI (SEC598), and to attack with AI (SEC535). No course teaches the SOC to detect and respond to adversary content generated BY AI — deepfake voice/video phishing, LLM-authored BEC, polymorphic AI malware, agentic attack chains, and synthetic identities. This course closes that gap on the SEC450 graduate pathway.”
3. Target audience and prerequisites
Primary audience:
- Detection engineers in SOCs that have adopted (or are adopting) LLM/agent tooling
- Senior SOC analysts and threat hunters
- Security architects making cloud-API vs on-prem AI deployment decisions
- Blue-team leads building AI integration roadmaps for their orgs
Prerequisites:
- Required: SEC450 (Blue Team Fundamentals & SOC) or equivalent SOC experience
- Required: Python literacy — students must be able to read and modify scripts
- Not required: Formal ML training, LLM API experience beyond consumer chatbots, or data-science background
Why no formal ML prerequisite: Day 1 surveys the detector’s AI stack at the depth needed for the course — students learn detection-grade RAG, embeddings as signal, and the deployment decision, without needing SEC595’s ML foundations. The course is detection-engineering-first, AI-second.
4. Course arc
Day 1 The detector's AI stack + AI-generated phishing at scale
Day 2 Deepfake-driven BEC, vishing, and synthetic identity
Day 3 LLM-authored malware + prompt injection against enterprise copilots
Day 4 Agentic adversaries + AI supply-chain compromise
Day 5 Capstone — "Operation Hollow Mirror"
Structural principle: the detector’s AI tooling is woven into each threat-class day, not segregated into a “tooling” block. Days 1-4 each have ~6 modules of ~50 minutes plus a 2.5-3 hour hands-on lab. Day 5 is a full 8-hour immersive capstone.
5. Day 1 — The Detector’s AI Stack + AI-Generated Phishing at Scale
Learning objectives
By end of Day 1, students can:
- Articulate what changed about detection engineering when adversaries gained access to LLMs
- Build a detection-grade RAG corpus from MITRE ATT&CK, ATLAS, threat intel, and ticket history
- Justify open-weight (on-prem) vs cloud-API model deployment for adversary-content classifiers
- Detect AI-generated phishing using stylometric, semantic, and behavioral signals
- Distinguish “AI-uplifted legacy TTP” from “novel AI-only TTP” — and why each demands different detections
Modules
1.1 — What changed when adversaries got LLMs. The 2024 Microsoft/OpenAI joint disclosure (Forest Blizzard, Charcoal Typhoon, Crimson Sandstorm using GPT for recon, lures, and translation); Google TAG 2025 reporting on state-actor LLM use. Course thesis: detection’s adversary signal is now generated text, audio, video, and code — not just hashes and IPs. Takeaway: the corpus of “adversary artifact” has expanded; detection engineering must follow it.
1.2 — The detector’s AI deployment decision. Open-weight (Llama 3.x/4 range, Qwen 2.5/3 range, Mistral, DeepSeek-V3, gpt-oss) on Ollama/vLLM vs cloud API (Claude Sonnet 4.5-class, GPT-5-class, Gemini 2.5-class, Bedrock). Decision matrix: latency budget, $/Mtok at SOC volume, data residency (GDPR/CJIS/ITAR), regulatory audit, fine-tuning needs. Instructor note: verify exact current model names and versions against vendor pages the week of delivery — the model landscape moves faster than slide decks.
1.3 — Embeddings as the detector’s highest-ROI primitive. Why embeddings beat generation for most detection-engineering work. BGE-large, E5, nomic-embed-text for security text. Clustering AI-generated phishing campaigns by embedding similarity: each campaign sample is a near-duplicate in embedding space, making campaigns trivially detectable once clustered. Takeaway: reach for embeddings before generation. They are cheaper, faster, more interpretable, and less attack-surface.
1.4 — RAG for detection engineering. Hybrid retrieval (BM25 + dense via Reciprocal Rank Fusion) is non-negotiable in security — IOC strings and ATT&CK IDs need exact match, which dense vectors lose. Reranking with bge-reranker-v2-m3. RAGAS faithfulness eval. Citation enforcement: every claim traces to a chunk_id. Takeaway: if you cannot measure faithfulness on a held-out golden set, you are shipping a hallucinator with extra steps.
1.5 — Detecting AI-generated phishing (lecture + signals). Signals taxonomy: stylometric drift in inbound email corpora, perplexity/burstiness anomalies on inbound text, semantic campaign clustering, URL-rotation entropy, vendor-impersonation lookalike domains scored by an LLM classifier. Reference: industrialized LLM-for-crime — FraudGPT and WormGPT successors (GhostGPT, jailbroken open-weight forks sold on Telegram/dark-web markets).
1.6 — Anti-patterns to avoid. “We will block ChatGPT at the proxy” does not stop adversaries (they use local models, residential proxies, stolen API keys) and does not protect internal copilots (which sit inside the trust boundary — see EchoLeak). LLMs do not “reduce false positives” as an intrinsic property — they shift the error distribution to an attacker-influenceable mode. Treat any LLM that reads adversary-controlled text as part of your attack surface from Day 1.
Lab 1 — “Hunt the campaign in 5,000 emails”
Stack: EC2 with Ollama + Llama 3.1-8B local, Bedrock or Anthropic API key (rate-limited budget), bge-large embeddings, FAISS for vector store, Qdrant for metadata-filtered retrieval, LangChain/LangGraph for orchestration.
Scenario: Students receive a 5,000-email corpus seeded with 4 distinct AI-generated phishing campaigns (different actors, different lures, different LLMs of origin).
Deliverables: (a) Cluster the corpus by embedding similarity. (b) Identify the 4 distinct campaigns. (c) Extract campaign-distinctive features (sender patterns, URL entropy, stylistic signatures). (d) Write a Sigma rule that catches campaign #2 in production traffic.
6. Day 2 — Deepfake-Driven BEC, Vishing & Synthetic Identity
Learning objectives
- Detect synthetic audio in real-time and post-hoc using spectral, codec, and behavioral signals
- Map deepfake-enabled BEC to MITRE ATT&CK T1566 phishing tree and identify detection gaps
- Build a deepfake-resilient IR playbook with out-of-band verification gates as the primary control
- Operate a sandboxed deepfake-generation toolchain to internalize the offensive cost curve
Modules
2.1 — Anchor case study: Arup Hong Kong, Feb 2024. HK$200M (~US$25.6M) wire fraud via deepfake video call. Confirmed by Hong Kong Police press conference (Feb 2024). Walk the timeline minute by minute. Companion case: LastPass attempted CEO-voice deepfake on an employee via WhatsApp (Apr 2024, LastPass disclosure). LastPass failed because the employee recognized an out-of-band channel anomaly — that out-of-band check is the entire defense.
2.2 — Synthetic audio detection: what is actually catchable. Spectral discontinuities at frame boundaries, missing room impulse response, codec-resampling traces, vocoder fingerprints. Open-weight audio classifiers (e.g., AASIST family, vendor research). Live demo of real artifacts from common voice-clone pipelines (Whisper-aligned + XTTS, ElevenLabs-class outputs).
2.3 — Synthetic video detection — and why it is structurally harder. Generation has outpaced detection through 2025-2026. Cover physiological liveness (head-pan occlusion challenge, finger-cross-face challenge), C2PA provenance signing as an emerging standard, call-metadata anomalies (new SIP path, no historical caller-ID pairing). Frame this honestly: deepfake video detection is a layered defense problem, not a “we have a model that catches it” problem.
2.4 — The vishing kill chain and where to break it. Pre-call recon (LinkedIn scraping signals as detection telemetry), the call itself (out-of-band verification policy as detection — its absence is the alert signal), post-call wire/access patterns. SIEM detections on workflow gaps: payment instructions changed without secondary-channel confirmation = high-fidelity alert. Teach detection engineers to write rules on process violations, not just artifacts.
2.5 — Synthetic identity at scale. AI-generated executive profiles for vendor fraud, KYC-bypass synthetic identities, AI-uplifted CEO impersonation in Slack/Teams. Detection at the platform layer (chat/email metadata) vs at the endpoint (DLP, behavioral biometrics).
2.6 — IR playbook: deepfake-suspected incident. First 30 minutes: preserve audio/video artifacts, freeze the source channel, OOB-verify with the impersonated party, hold financial transactions. Hours 1-4: forensic artifact analysis, transaction reversal windows, regulatory notification thresholds (state breach laws, EU NIS2, financial-sector triggers).
Lab 2 — “Detect the CFO clone, defend the wire”
Stack: Sandboxed Whisper + XTTS pipeline (SANS-owned synthetic CEO audio clip — never real-person without consent), open-weight audio classifiers, SIEM with mock wire-transfer telemetry.
Scenario: Students generate a voice clone against the synthetic clip, then operate the detection side. The lab is deliberately designed with a hard case: the audio detector scores 0.61 against a 0.7 default threshold — below alarm, but the audio is fake. Students who blindly trust the threshold miss the attack.
Deliverable: Tuned detection thresholds with documented false-positive cost, plus a workflow-gap SIEM rule that catches the wire transfer even when the audio detector misses — because the OOB verification step was skipped.
7. Day 3 — LLM-Authored Malware + Prompt Injection Against Enterprise Copilots
Learning objectives
- Identify LLM-authorship fingerprints in malicious code
- Detect runtime LLM-generated payloads (BlackMamba-class, PromptLock-class threats)
- Operate input/output guardrails (Llama Guard 3, Prompt Guard 2, NeMo Guardrails, Azure Prompt Shields) against direct and indirect prompt injection
- Hunt EchoLeak-class zero-click data exfil through enterprise copilots
Modules
3.1 — LLM-authorship signals in dropped code. HP Wolf Security threat report (May 2025) on LLM-authored AsyncRAT droppers. YARA rule patterns on AI-idiom code comments, over-explanatory variable names, unusually verbose docstrings, and templating fingerprints. Caveat: this is a noisy signal that must be paired with behavioral correlation. False positives are common (junior developers write LLM-shaped code too).
3.2 — Polymorphic and runtime-generated malware. HYAS Labs BlackMamba (2023) — runtime LLM-generated keylogger PoC. CyberArk EyeSpy research. ESET PromptLock (Aug 2025) — first publicly documented ransomware calling an LLM at runtime to generate Lua payloads per victim. Detection signatures: outbound calls to LLM API endpoints from non-developer processes, long-lived sessions from server workloads, structured-JSON HTTP bodies as a behavioral signature.
3.3 — OWASP LLM Top 10 (2025) as a detection-engineering checklist. Walk LLM01 (prompt injection — direct and indirect), LLM02 (sensitive information disclosure), LLM06 (excessive agency), LLM08 (vector and embedding weaknesses). Each maps to a concrete detection-engineering deliverable for the student’s home org.
3.4 — Prompt injection against enterprise copilots — the EchoLeak class. Walk CVE-2025-32711 (EchoLeak, Aim Security disclosure June 2025) — zero-click M365 Copilot data exfil via crafted email parsed by the assistant. Companion: Bargury’s “Living off Microsoft Copilot” (Black Hat USA 2024). Detection telemetry: instruction-override patterns (“ignore previous”), base64/zero-width encodings, role-confusion strings, embedding-space anomalies on retrieved chunks, egress monitoring for sensitive markers, canary-token documents seeded in the RAG corpus.
3.5 — The guardrails stack as detection telemetry. Llama Guard 3 / Prompt Guard 2 (Meta), Azure AI Content Safety Prompt Shields including indirect-injection detection (Microsoft), NVIDIA NeMo Guardrails colang flows. Each emits structured events — pipe to SIEM. The guardrail stack is not just a control; it is a detection sensor.
3.6 — The lethal trifecta (Simon Willison, 2025). Any agent or copilot with simultaneously {sensitive data access, untrusted input, external-action capability} is a data-exfiltration pipe waiting to fire. Detection-engineering action: identify every internal copilot in your org and grade it against the trifecta. Trifecta-positive systems are P1 audit targets.
Lab 3 — “Catch the EchoLeak in your tenant”
Stack: Lab tenant with mock M365-style Copilot, RAG over a SharePoint-like corpus, instrumented egress logging.
Scenario: The corpus contains 50 benign documents and 1 document with a known indirect-injection payload designed for data exfil. Students must (a) detect the exfil attempt without prior knowledge of the payload location, (b) build a recurring detection rule that survives 5 payload variants seeded by the lab, (c) document a guardrail-stack deployment that closes the EchoLeak class.
Injected lesson: the lab also contains a benign document that looks like an injection (a security training document that quotes prompt-injection examples). Students who over-fit on surface patterns trigger a false positive and lose points.
8. Day 4 — Agentic Adversaries + AI Supply-Chain Compromise
Learning objectives
- Detect adversary agent telemetry (loop patterns, API call signatures, multi-step orchestration)
- Audit the org’s own agent stack for the lethal trifecta and HITL gate adequacy
- Identify malicious open-weight models, backdoored fine-tunes, and poisoned RAG corpora
- Build SBOM-for-models discipline and CI gates for adversarial behavior
Modules
4.1 — The agentic adversary. Anthropic’s Disrupting AI misuse reports (October 2024 through 2025) documenting end-to-end agent operations by financially motivated and state actors. UK NCSC + CISA joint 2025 guidance on agentic AI abuse. ESET PromptLock as the canonical public agentic-malware case (Aug 2025).
4.2 — Detection signatures for adversary agents. Long-lived outbound sessions to model APIs from server workloads, regular polling intervals (agents are clock-driven), structured JSON in HTTP bodies, tool-use telemetry repurposed — your own agent stack already emits this telemetry; flip the schema to detect adversary agents in your network. MITRE ATLAS agentic tactics (2025 additions). Concrete YARA-equivalents for agent traffic.
4.3 — Hardening your own agents (call-back from Day 1). Action-criticality matrix for HITL gates: read-only enrichment = auto; ticket creation = auto with audit; user notification = auto; host isolation, credential reset, firewall rule = HITL required; cross-domain (AD/EDR/cloud-IAM) actions = dual approval. LangGraph interrupt() / Command(resume=...) mechanics. Audit-log schema: who, what, prompt-hash, model-version, tool-args, latency. Never gate HITL on model self-reported confidence — use action-criticality, not certainty.
4.4 — Supply-chain compromise of ML artifacts. JFrog disclosure of ~100 malicious Hugging Face models (Feb 2024) — pickle deserialization payloads. PyTorch torchtriton dependency-confusion incident (Dec 2022) as canonical case. Controls: mandatory Safetensors-only loading, model SBOM (card hash, training-data provenance, fine-tune lineage), egress-blocked model-loading sandboxes, behavioral canaries in CI gates.
LiteLLM/Mercor breach (Mar-Apr 2026) — TeamPCP compromised Trivy build process, stole a LiteLLM maintainer PyPI publish token, and uploaded malicious litellm v1.82.7/v1.82.8 to PyPI. Payload harvested SSH keys, cloud creds, Kubernetes secrets, API keys, and DB creds to spoof domain models.litellm.cloud; v1.82.8 added .pth persistence. Mercor confirmed ~4TB stolen, including source code, user DB, video interviews, and passport scans for 40,000+ contractors.
4.5 — Backdoored fine-tunes and sleeper-agent models. Anthropic / Redwood Research “Sleeper Agents” (2024) — trigger-conditioned misbehavior surviving safety training. The hard truth: you cannot fully clear a third-party fine-tune through external evaluation alone; you must combine evals with provenance and behavioral monitoring in production. Frame for SOC: treat third-party model use the way you treat third-party software — with SBOM, version pinning, behavioral monitoring, and incident response when anomalies surface.
4.6 — Poisoned RAG corpora. Adversary plants instructions in a document that will be retrieved by an internal copilot. Defender’s own copilot acts on adversary instructions. This is the most overlooked attack class in 2026 — most SOCs have not audited their RAG corpora for provenance. Mitigations: ingestion-time provenance checks, instruction-stripping on retrieved text, treat all retrieved content as data not instructions (this is the Day 3 EchoLeak lesson extended to internal-document supply chain).
Lab 4 — “Red vs Blue: run the AI attack, then catch yourself”
Setup: Air-gapped lab range with two halves. Students rotate roles.
Red half: Containerized fork of an LLM-phishing kit (instructor-provided, defanged — no live SMTP, no real domains, no credential capture beyond the range DB), local Whisper + XTTS voice clone against SANS-owned synthetic audio, sandboxed indirect-injection generator seeding mock SharePoint feeding the Day 1 RAG system.
Blue half: Operates the Day 1-3 detection stack (LLM-classifier email pipeline, RAG guardrails, agent telemetry collection).
Phases:
- 0:00-0:45 — Red launches campaigns; Blue tunes detections live.
- 0:45-1:30 — Hot wash; instructors reveal which artifacts Blue missed.
- 1:30-2:15 — Roles swap, new attack variants.
- 2:15-3:00 — Joint retro; students commit detection rules back to a shared lab repo.
Lab 4 ethical and legal scope
Forbidden in this lab:
- Real third-party impersonation (no actual companies, individuals, or brands)
- Real biometric data of non-consenting persons
- Exfil to the public internet
- Persistence beyond the range
- Distribution of working WormGPT/FraudGPT weights or jailbreak prompts that bypass current-version commercial models
Required controls:
- Synthetic SANS-owned audio/video likenesses only
- All deepfake outputs C2PA-watermarked at generation and destroyed at lab end
- Student acceptable-use addendum signed before lab access
- Export-control review required for non-US delivery (EAR ECCN 5D002 considerations for offensive cyber tooling)
9. Day 5 — Capstone: “Operation Hollow Mirror”
Marketing line (the sentence the course is sold on)
“For eight hours, you defend Verdancy Health against PROMETHEUS-7 — an AI-orchestrated adversary that has studied your AI SOC, knows how it reasons, and built an attack designed to make your own agents lie to you.”
Scenario
Target: Verdancy Health Cooperative — 14,000-employee regional healthcare insurer (US Midwest). Public assets: member portal, customer-service chatbot “NoraBot” (fine-tuned open-weight LLM with RAG over a claims knowledge base), Microsoft 365 tenant, hybrid AWS/on-prem claims pipeline. The (fictional) CISO publicly bragged at a conference about Verdancy’s “agentic AI SOC.”
Adversary: PROMETHEUS-7, financially motivated crew using an orchestrator-agent framework + commercial deepfake-as-a-service. Goal: exfiltrate 2.1M PHI records and extort a ransom. They scraped the CISO’s conference talk and built the campaign around defeating Verdancy’s specific AI SOC stack.
Kill chain (4 stages):
- AI-driven recon and target selection — LLM-generated OSINT dossiers on 40 finance staff, identifying Brenda Castillo (AP Director) as the highest-value entry vector.
- Deepfake voice BEC — vishing call cloning the CFO’s voice, planting a malicious “vendor onboarding” PDF.
- Indirect prompt injection against NoraBot — poisoned support-ticket attachment causes NoraBot’s RAG agent to leak session tokens and execute attacker-supplied tool calls.
- Agentic data exfil with AI SOC manipulation — attacker agents exfiltrate PHI to a lookalike cloud bucket while simultaneously injecting crafted log entries designed to make the defender’s triage agent attribute the activity to a legitimate vendor sync job. This is the marketable twist.
Hour-by-hour schedule (8 hours)
| Time | Block | Activity |
|---|---|---|
| 0:00-0:30 | Briefing | Verdancy CISO video kickoff; environment check (SIEM, RAG, agent stack, mail/chat/cloud panes); baseline quiz (5 pts) |
| 0:30-1:45 | Phase 1 — Recon | Hunt OSINT/reconnaissance signals (anomalous LinkedIn scraping, AI-generated phishing precursors); tune Day-1 detection stack |
| 1:45-2:00 | Break | |
| 2:00-3:15 | Phase 2 — BEC + deepfake | Brenda forwards “weird CFO call”; retrieve voicemail, run Day-2 deepfake audio model, pivot to malicious PDF, contain Brenda’s endpoint |
| 3:15-4:00 | Lunch | |
| 4:00-5:15 | Phase 3 — Prompt-injection IR | NoraBot leaks tokens; read NoraBot trace logs, identify poisoned ticket, decide disable-vs-sandbox, forensicate the injection payload |
| 5:15-6:30 | Phase 4 — The Mirror twist | SIEM shows “vendor sync” exfil; defender’s AI triage agent confidently labels it benign; students must distrust their own agent, ground-truth in raw CloudTrail, identify exfil bucket, cut credentials, notify legal |
| 6:30-7:15 | Reporting block | Executive summary, IOC list, CISO memo (timed, against template) |
| 7:15-8:00 | Hot wash | Replay PROMETHEUS-7 timeline; instructor walks each red-team move; GIAC exam blueprint mapping |
Scoring rubric (1000 points total; pass bar 700 for GIAC capstone credit)
| Category | Max | Mechanics |
|---|---|---|
| Detection | 400 | 100 pts/stage. Missing a stage caps the next stage’s max at 50% (cascading penalty mirrors real IR) |
| Containment | 200 | 50 pts/stage for correct action. Over-blocking a legitimate user: −25 pts. Failing to contain when warranted: 0 |
| Attribution | 150 | Identify PROMETHEUS-7 tooling: deepfake vendor (50), agent framework fingerprint (50), exfil infra (50). Hallucinated attribution from the student’s own AI agent, accepted without verification: −50 |
| Reporting | 150 | Executive summary (50), IOC list completeness (50), CISO memo quality (50) |
| AI SOC hygiene | 100 | Did the student catch their own agent’s wrong attribution in Stage 4? Yes = 100. Auto-trusted = 0 + flagged in debrief |
Required deliverables (graded; also GIAC evidence)
- Incident timeline (CSV/JSON) — every detected event with timestamp, source, MITRE ATLAS + ATT&CK mapping
- IOC list — hashes, domains, voice-print signature ID, prompt-injection payload, lookalike S3 bucket
- Executive summary (1 page) for the Verdancy CEO — non-technical, business impact framed
- CISO recommendation memo (2 pages) — three concrete control changes, one of which must address AI SOC trust calibration
- AI SOC agent post-mortem — short writeup of where the student’s own agent stack failed and how they would retrain or reground it
- Containment action log — every block/quarantine/revoke action with justification
Deliberately seeded teachable moments
- Hallucinated attribution: the triage agent confidently names a known APT. Students who copy-paste this into their memo lose attribution points.
- Over-block: an aggressive student disables NoraBot entirely → simulated revenue loss alert fires. Teaches proportional response.
- Deepfake bypass: voice clone scores 0.61 on the detector (below the 0.7 default threshold). Students who blindly trust the threshold miss Stage 2.
- Legitimate vendor mimicry: the real vendor sync job also runs during Stage 4. Mis-identifying it triggers over-block penalty.
- Prompt-injection persistence: even after NoraBot is “fixed,” the poisoned RAG chunk remains in the vector store. Students who do not purge the index fail a hidden re-trigger event 30 minutes later.
- Reporting time pressure: the reporting block is intentionally tight — teaches that the report is part of IR, not after it.
Instructor nudges (cheat sheet)
- Stuck on Phase 1 recon signals → “Look at user-agent entropy and timing on the careers page.”
- Cannot find the deepfake → “Voicemail attachment, not the email body. Run the Day-3 audio model.”
- NoraBot logs look fine → “Pull the tool-call trace, not the chat transcript.”
- Trusting the triage agent in Phase 4 → “Ground-truth check: does CloudTrail agree with what your agent is telling you?”
- Over-blocking spiral → “What does this action cost Verdancy in the next hour?”
- Attribution rabbit hole → “Attribution is a claim with evidence. What is your evidence?”
- Frozen on report writing → “Lead with impact, then timeline, then asks. Three paragraphs.”
- Missed the persistence in vector store → after the re-trigger fires, ask “Where else could the payload live?”
- Panicking at time → “Triage what is still bleeding. Forensics can wait an hour. Exfil cannot.”
- Finished early (rare) → push them to the AI SOC post-mortem; senior analysts separate from juniors there.
Variant scenarios (for course repeatability without rewrite)
The course will run dozens of times across SANS events. Three pre-built scenario variants swap the org and the Stage 3 surface while keeping the four-stage shape and the AI-SOC-manipulation twist:
| Variant | Org | Stage 3 surface | Stage 4 impact | Adversary |
|---|---|---|---|---|
| Hollow Mirror Fintech | Halgrove Capital Partners (regional bank) | Internal copilot (not customer chatbot) | Exfil | STYX-4 |
| Hollow Mirror OT | Brackenwell Industrial Systems | Maintenance-scheduling agent controlling OT work-order dispatch | Sabotage (not exfil) | CINDERHOOK |
| Hollow Mirror Public Sector | State of Lincoln DMV | Deepfake video of state official | Exfil + parallel journalist-leak forcing comms response | PALEHORSE-9 |
Instructor effort to swap a variant: ~1 day of content reseeding.
10. Pitch positioning and conversation script
Opening line for Lisa and John
“SEC450 teaches the analyst to use AI. SEC598 teaches the team to automate with AI. SEC535 teaches the red team to attack with AI. Nothing teaches the SOC to detect adversary content generated by AI. I want to close that gap on the SEC450 graduate pathway — and I want to do it with John.”
John framing (specific script)
“The gap I want to fill assumes SEC450 as prerequisite — your course is the on-ramp. I want to make sure this course lives on the SEC450 pathway, not next to it. Would you be open to co-authoring, or otherwise shaping the relationship between the two? I have a full blueprint that explicitly does not re-teach the RAG/agent/Ollama content you already cover.”
This framing gives John an off-ramp (he can offer to mentor or technical-review instead of full co-authorship), gives him ownership of the SEC450 → this-course pathway, and signals that the proposer respects the existing curriculum.
Three artifacts to bring to the next John conversation
- This blueprint (the document you are reading)
- A 10-slide pitch deck derived from this blueprint (separate deliverable; see “Next steps”)
- A sample Day 1 lab handout — single PDF, ~6 pages, demonstrating instructional quality and lab realism (separate deliverable)
Open items for Lisa to validate (parking lot)
- GIAC name “GAIDA” — not currently in use per public GIAC focus-area page (GASAE/GAIPS/GOAA/GMLE are the four in flight) but SANS internal roadmap may have constraints. Backup: GAITA.
- Lab platform alignment — SANS uses Skytap and OnDemand cloud labs; the EC2 model in this blueprint may need to migrate to the SANS-standard platform. Pricing and provisioning implications.
- Royalty split with John as co-author — standard SANS practice; needs explicit conversation, not assumption.
- Export-control review for the Day 4 lab — non-US delivery may require additional clearance (EAR considerations).
- C2PA watermarking workflow for synthetic media artifacts — needs SANS legal sign-off on retention and destruction policy.
11. Anti-patterns this course explicitly teaches against
These are the wrong responses to AI-generated threats that SANS instructors must call out as misconceptions:
- “Block ChatGPT at the proxy and call it done.” Does not stop adversaries (local models, residential proxies, stolen keys) and does not protect internal copilots that sit inside the trust boundary.
- “LLMs reduce false positives in the SOC.” They do not. They shift the error distribution to an attacker-influenceable mode where adversary-crafted log content can steer the model.
- “Our RAG demo enriches every alert beautifully.” A demo where eval data is also indexed data is a hallucinator with extra steps; production novel-alert recall is the real metric.
- “Our agent has confidence 0.92, so we auto-approved.” Never gate HITL on model self-reported confidence. Use action-criticality, not certainty.
- “We have an audio detector — we are deepfake-safe.” The 0.61-vs-0.7 threshold lab teaches that detection is layered; out-of-band verification is the durable control.
- “We will fully eval third-party fine-tunes before deployment.” You cannot fully clear a third-party fine-tune through external evaluation alone. Combine evals with provenance and behavioral monitoring in production.
12. Key references (verified real, 2024-2026)
Threat reporting and incidents:
- Arup HK$200M deepfake video BEC, Hong Kong Police, Feb 2024
- LastPass attempted CEO-voice deepfake disclosure, Apr 2024
- Microsoft + OpenAI joint state-actor disclosure (Forest Blizzard, Charcoal Typhoon et al.), Feb 2024
- Google TAG state-actor LLM use reporting, Jan 2025
- JFrog disclosure of ~100 malicious Hugging Face models, Feb 2024
- HP Wolf Security threat report on LLM-authored AsyncRAT droppers, May 2025
- ESET PromptLock disclosure, Aug 2025
- Aim Security EchoLeak / CVE-2025-32711 disclosure, June 2025
- LiteLLM PyPI supply-chain compromise affecting Mercor, TechCrunch / The Register / LiteLLM official disclosure, Mar-Apr 2026
Research and frameworks:
- Greshake et al., Not What You’ve Signed Up For: Indirect Prompt Injection (arXiv:2302.12173)
- Anthropic, Building Effective Agents (Dec 2024)
- Anthropic / Redwood, Sleeper Agents (2024)
- Anthropic, Disrupting AI Misuse reports (2024-2025)
- OWASP, Top 10 for LLM Applications 2025
- MITRE ATLAS (2025 agentic-AI tactics additions)
- RAGAS — Es et al. (arXiv:2309.15217)
- Barnett et al., Seven Failure Points When Engineering a RAG System (arXiv:2401.05856)
- Simon Willison, Lethal Trifecta (2025)
- Bargury, Living off Microsoft Copilot (Black Hat USA 2024)
Vendor and platform documentation:
- Meta Llama Guard 3 and Prompt Guard 2 model cards
- NVIDIA NeMo Guardrails (colang 2.0 spec)
- Microsoft Azure AI Content Safety Prompt Shields
- LangGraph HITL primitives (
interrupt,Command) - Microsoft Security Copilot architecture documentation
- Google SecOps Duet AI / Gemini in Security Operations
13. Next steps
After this blueprint is approved:
- Pitch deck — 10 slides distilling sections 1, 2, 4, 9 of this document. Slide 2 carries the capstone marketing line.
- Sample Day 1 lab handout — single PDF, ~6 pages, showing the lab quality bar SANS expects.
- Instructor application addendum — short cover document referencing this blueprint, tailored to the SANS new-instructor application form.
- Outreach plan to John — sequenced outreach: warm intro through Lisa, share blueprint, propose co-author conversation, agree on framing before broader SANS curriculum committee review.
End of blueprint v1.0.