Module 1.5 — Detecting AI-Generated Phishing
50-minute lecture + lab brief · Day 1 afternoon
Learning objectives
By end of this module, students can:
- Identify five distinct detection signals for AI-generated phishing (stylometric drift, perplexity/burstiness, embedding-similarity campaign clustering, URL rotation entropy, lookalike-domain LLM scoring)
- Recognize the current LLM-for-crime marketplace: SpamGPT/KaliGPT (Sep 2025), jailbroken open-weight forks, and their pricing/capabilities
- Map AI-generated phishing TTPs to MITRE ATT&CK T1566 (Phishing) and identify which sub-techniques have shifted in 2025-2026
- Write a Sigma rule that flags likely AI-authored email content for elevated triage
The market in mid-2026
The economics of AI-generated phishing changed in 2025. Three years ago, a successful spear-phishing campaign required either a skilled native speaker on the operator’s team or significant manual translation effort. Today, a $5,000 commercial product (SpamGPT) sold on dark-web forums automates the entire workflow: scrape target list, generate locale-correct personalized lures via integrated AI assistant (KaliGPT), send through built-in SMTP/IMAP, monitor click-through in real time.
This isn’t an underground research project. It’s a productized SaaS. The verifiable record from September 2025 onward:
- SpamGPT — disclosed by multiple security publications in September 2025. Phishing-as-a-Service platform priced at $5,000 on underground forums. Sold via Telegram, Jabber, and invite-only networks. Bundled with integrated AI assistant (KaliGPT) for lure generation. SMTP/IMAP/campaign-monitoring built in.
- KaliGPT — the AI assistant component of SpamGPT. Generates personalized subject lines, body content, and audience targeting from a customer list. Not a separate product — bundled with SpamGPT.
- WormGPT successors — including GhostGPT and jailbroken Llama/Mistral forks distributed via Telegram. Less polished than SpamGPT but cheaper, often free or under $100. Used by lower-tier actors for opportunistic mass phishing.
- Operation HookedWing — a 4-year multi-sector phishing campaign documented by SOCRadar and SecurityWeek, targeting 500+ organizations across aviation, government, energy, and financial sectors. Significant for the longevity of the campaign and the targeting pattern (specific air corridors). LLM authorship of lures has been observed in 2024-2025 phases of the campaign, though the original 2022 operation predates AI authorship.
For curriculum purposes: lead with SpamGPT as the canonical commercial example, reference HookedWing as the canonical long-running campaign case study, and acknowledge the proliferation of cheaper variants without over-citing specific products that may rotate names quickly.
The signal taxonomy
Detection engineers have five categories of signal to work with. Day 1’s lab applies all five against a synthetic 5,000-email corpus; this module establishes the conceptual frame.
Signal 1: Stylometric drift
Every legitimate sender has a writing style. Word-length distribution, sentence-length variance, function-word frequency, comma density, capitalization patterns — these are individual fingerprints. AI-generated email content has detectable stylometric properties that differ from human writing in aggregate: more uniform sentence length, lower function-word entropy, near-zero typo rate, characteristic preference for certain rhetorical structures (“Specifically,” “Moreover,” “It is important to note that”).
A detection engineer cannot reliably fingerprint a single email this way. But against a corpus of inbound mail from a single purported sender, drift from that sender’s historical style is a strong signal. The detection requires building a per-sender stylometric profile from validated historical mail, then scoring inbound against that profile.
Signal 2: Perplexity and burstiness anomalies
Perplexity is a measure of how “surprising” a text is to a language model. AI-generated text tends toward low perplexity (the LLM was trained to produce predictable next tokens) and low burstiness (variance in sentence-level perplexity is suppressed compared to human writing).
Tools like GPTZero pioneered this approach in 2023. The 2026 versions are more sophisticated but the underlying signal is the same. Detection engineers can compute perplexity locally with any open-weight model:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# A small model is sufficient for perplexity scoring
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
def text_perplexity(text: str) -> float:
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs, labels=inputs["input_ids"])
return torch.exp(outputs.loss).item()
# Burstiness = variance of sentence-level perplexity
def text_burstiness(text: str) -> float:
sentences = sentence_tokenize(text)
perplexities = [text_perplexity(s) for s in sentences]
return float(np.var(perplexities))
Caveat: Perplexity-based detection has known false-positive rates against non-native English writers (their text is also low-perplexity), and AI-authoring tools increasingly counter-tune by inserting variance. Treat as one signal among many, never as a sole basis for action.
Signal 3: Embedding-similarity campaign clustering
Covered in detail in Module 1.3. AI-generated phishing campaigns produce dozens to hundreds of lure variants with the same semantic content and surface-level variation. Embedding-cluster every inbound phishing-flagged mail; tight clusters with high member counts and same-day arrival are campaign indicators. This is the highest-leverage AI-phishing detection a SOC can deploy, requires no reasoning LLM, and runs on commodity hardware.
Signal 4: URL-rotation entropy
AI-generated phishing campaigns typically rotate URLs aggressively to evade blocklists. The entropy of URLs across the campaign is high — domain registrations days old, subdomain patterns rotating, redirect chains varying per recipient. Detection: compute URL diversity within a campaign cluster, flag clusters with high URL-entropy as elevated-confidence AI campaigns.
Signal 5: Lookalike-domain LLM scoring
The classic visual lookalike (m1crosoft.com, g00gle.com) is detectable by string-similarity edit-distance from known brands. AI-generated phishing extends this to semantic lookalikes — domains that aren’t visually similar but evoke the target brand (account-verification-microsoft-service.com, urgent-google-security-alert.net). String-distance metrics miss these.
The detection pattern: train a small classifier that takes a domain string + a list of brands you care about, and returns “does this domain attempt to evoke any brand on the list?” Implementation:
PROMPT = """Does this domain attempt to evoke or impersonate any of these brands?
Brands: {brand_list}
Domain: {domain}
Answer ONLY 'yes' or 'no' followed by the evoked brand (if any).
"""
def score_lookalike(domain: str, brands: list[str]) -> dict:
response = small_llm_call(PROMPT.format(brand_list=", ".join(brands), domain=domain))
return parse(response)
This is the one place a reasoning LLM is justified at scale — semantic similarity to brand names is exactly what LLMs are good at. Run on every inbound domain at email-gateway ingest.
The detection stack (what students should leave with)
A defensible AI-phishing detection stack in 2026 layers all five signals:
Inbound email
↓
[1] Per-sender stylometric drift score (offline-trained profile per sender)
[2] Perplexity + burstiness score (against generic English baseline)
[3] Embedding into vector store (campaign clustering at the corpus level)
[4] URL extraction + rotation-entropy (per-campaign metric)
[5] Lookalike-domain LLM scoring (per-domain at gateway)
↓
Composite score → routing decision
• >0.85 composite: immediate block + alert
• 0.55-0.85: quarantine + analyst review
• <0.55: deliver + log
The single-signal failure rate of each is too high to deploy alone. The composite score is the production-grade output. Day 1’s Lab walks students through implementing exactly this on the provided synthetic corpus.
Mapping to MITRE ATT&CK
AI-generated phishing primarily lives under T1566 (Phishing) with sub-techniques:
- T1566.001 Spearphishing Attachment — AI-generated cover letters and attachment lures
- T1566.002 Spearphishing Link — AI-generated lure content with rotating URL infrastructure
- T1566.003 Spearphishing via Service — AI-generated content delivered via legitimate platforms (LinkedIn messaging, Slack abuse via webhooks)
- T1566.004 Spearphishing Voice — covered in Day 2 (vishing/deepfake voice)
The 2025-2026 TTP shift is in scale and personalization. Pre-2024 spear-phishing was hand-crafted for high-value targets. Post-2024 spear-phishing is automatically personalized for every recipient on a 10,000-name list with target-specific lure content. The defender’s detection cadence must catch up.
Sigma rule for elevated triage of likely-AI email
The detection engineer’s tool for SIEM-level enforcement is a Sigma rule. The example below flags emails with stylometric and structural features common to AI-generated content; it routes them to elevated triage, not auto-blocking (because false positives against non-native English writers are real).
title: Likely AI-Generated Phishing Email Content
id: f4c2d5a3-7b21-4e9c-8d31-2f8e7c4b9a01
status: experimental
description: |
Flags inbound emails with structural features common to AI-authored content
(low perplexity + low burstiness + high embedding-similarity to known campaign cluster).
Composite score >= 0.75 routes to elevated triage queue.
references:
- https://www.varonis.com/blog/spamgpt
- https://socradar.io/blog/operation-hookedwing-4-year-phishing/
author: vExpertAI × SANS Course
date: 2026/05/14
logsource:
product: email_gateway
service: inbound_mail
detection:
selection:
perplexity_score: '<25' # tunable per environment
burstiness_score: '<0.20' # tunable; AI text has low variance
embedding_cluster_match: true # alert matches existing campaign cluster
cluster_size: '>=5' # at least 5 sibling alerts in cluster
sender_stylometric_drift: '>0.4' # significant drift from historical profile
condition: selection
fields:
- sender_address
- subject
- cluster_id
- composite_ai_score
falsepositives:
- Non-native English speakers writing legitimately
- Highly templated legitimate transactional mail
level: medium
tags:
- attack.initial_access
- attack.t1566.001
- attack.t1566.002
Tunable thresholds are environment-specific. Calibrate against your own validated-legitimate and validated-phishing samples. The composite score formula (combining the five signals into one number) is itself a design decision — Day 1’s lab gives students one defensible approach and asks them to argue for or against it.
Discussion questions (~10 min)
- Your perplexity-based detector has a 12% false positive rate against legitimate emails from your offshore engineering team in Bangalore. The team writes in fluent but non-native English. What composite-scoring adjustment reduces the FP rate while preserving sensitivity to AI campaigns?
- SpamGPT advertises that it can generate lures personalized for individual recipients based on scraped LinkedIn data. Walk through the kill chain a defender can disrupt — which signals appear at which stage, and which are unique to the AI-generated variant versus the human-generated equivalent?
- Your CISO asks “if we just block ChatGPT at the corporate proxy, doesn’t that stop this class of attack?” Construct the argument for why this is wrong, citing at least three specific failure modes from this module and Module 1.1.
Common mistakes
| Mistake | Better approach |
|---|---|
| Deploying perplexity-only detection | Composite scoring across five signals |
| Treating one campaign cluster as one phishing event | Cluster size + URL entropy together quantify campaign scale |
| Auto-blocking on AI-content score alone | Auto-block requires very high composite score + at least one corroborating signal (known-bad URL, attachment, etc.) |
| Building per-sender stylometric profiles offline once | Refresh per-sender profiles weekly; senders’ styles drift legitimately |
| Citing GPTZero as the AI-detection ground truth | GPTZero is one signal among many; no single tool reliably detects AI authorship in adversarial settings |
What’s next
Module 1.6 closes Day 1 with anti-patterns — the wrong responses to AI threats that detection engineers must call out as misconceptions. Then Lab 1 puts the entire day’s content together on a synthetic 5,000-email corpus.