Module 4.4 — Supply-Chain Compromise of ML Artifacts
50-minute lecture · Day 4 afternoon · Hands-on Python in the lab
Learning objectives
By end of this module, students can:
- Walk the LiteLLM/Mercor incident (March-April 2026) in technical detail — TeamPCP attacker, Trivy build-process compromise, malicious litellm 1.82.7/1.82.8 on PyPI, 4TB Mercor exfil, Meta pause
- Cite the JFrog Hugging Face disclosure (Feb 2024) and PyTorch torchtriton (Dec 2022) as the foundational ML supply-chain cases
- Apply the model SBOM discipline to inventory ML artifacts in their org — using the Codex-generated
model_sbom.pytool plus industry frameworks (CycloneDX MLBOM, Sigstore model-signing) - Identify the scanning-tool ecosystem (picklescan, safetensors-scan, Llama Guard, Azure Prompt Shields) and where each fits in the ML supply-chain defense layer
The structural shift
For decades, software supply-chain security has been a recognized but under-invested discipline. The 2020 SolarWinds incident raised executive awareness; the Log4Shell, MOVEit, and 3CX incidents reinforced it. By 2024, every major org had some supply-chain security investment.
ML artifacts are a new supply-chain frontier. A model weight file looks like data, not code. A Hugging Face model isn’t on the same procurement pipeline as a software vendor. A PyPI package that wraps an LLM API isn’t differentiated from any other PyPI package by most software-composition-analysis tools.
The 2024-2026 supply-chain incidents documented in this module demonstrate that the ML supply chain is now an active attack target. The defender’s discipline needs to extend: every component of your LLM stack — models, datasets, packages, tooling — is a potential injection vector.
The LiteLLM/Mercor incident (March-April 2026)
The most consequential ML supply-chain incident documented to date in this course.
The chain of compromise
-
~March 20, 2026: Threat actor group “TeamPCP” compromised the build process of Trivy (a popular open-source vulnerability scanner widely used in CI/CD). The compromise gave TeamPCP access to credentials stored in Trivy’s build environment.
-
Token exfiltration: Among the credentials, TeamPCP extracted a PyPI publish token belonging to a maintainer of LiteLLM (a popular Python library for unifying LLM API calls —
pip install litellm). -
March 24, 2026, 10:39 UTC: TeamPCP used the stolen token to publish malicious LiteLLM versions 1.82.7 and 1.82.8 directly to PyPI. The packages bypassed the official CI/CD pipeline; they were uploaded outside the normal release process.
-
Live window: ~40 minutes. PyPI’s automated security scanning flagged the packages; they were quarantined within 40 minutes of publication. Anyone who ran
pip install litellmduring that window received a compromised version. -
The malicious payload: A credential stealer designed to harvest:
- SSH keys
- Cloud credentials (AWS, GCP, Azure)
- Kubernetes secrets
- API keys
- Database credentials
-
Exfiltration destination: Credentials were sent to
models.litellm.cloud— a spoof domain that resembles a legitimate LiteLLM endpoint but is not controlled by LiteLLM. (Real LiteLLM uses official-domain endpoints;models.litellm.cloudis the attacker’s collection server.) -
Persistence (v1.82.8 only): The 1.82.8 version added a
.pthfile injection — Python pathway file modification that ensures the malicious code re-executes on every Python startup, even after the user “uninstalls” litellm.
The Mercor breach
Mercor is an AI hiring startup. They were a primary downstream victim:
- Date of breach disclosure: April 1, 2026 (Mercor’s public statement)
- Total exfiltrated: Approximately 4 terabytes of data
- Specific components:
- ~939 GB of platform source code
- ~211 GB of user database (account records, communications, hiring data)
- ~3 TB of video interview recordings and identity-verification documents (including passport scans) for 40,000+ contractors on Mercor’s platform
The downstream impact
- Meta paused all Mercor contracts following the breach disclosure (Meta was a significant Mercor customer)
- A class action lawsuit was filed against Mercor on behalf of affected contractors
- The incident has been widely covered as the most consequential ML-supply-chain attack to date
Sources
- TechCrunch reporting on the incident (March 31, 2026)
- The Register, “Malicious LiteLLM packages on PyPI lead to massive Mercor data breach” (April 2, 2026)
- LiteLLM’s official security disclosure at
docs.litellm.ai/blog/security-update-march-2026 - Hackread, CyberSecurityNews, and other security publications
Why this matters for detection engineering
Detection signatures that would have caught LiteLLM 1.82.7/1.82.8 at scale:
- Lockfile scanning:
pip freezeoutput orrequirements.lockfiles containinglitellm==1.82.7orlitellm==1.82.8— alert in your CI/CD and developer workstation telemetry - Outbound egress to
models.litellm.cloud— non-standard domain; should not appear in any legitimate traffic from your LLM proxy infrastructure - CI/CD pipeline integrity: monitoring the integrity of build-time dependencies — if Trivy’s own build process is compromised, downstream consumers of Trivy need a way to detect it
- PyPI package-publishing anomaly detection: publishing pattern outside the maintainer’s typical cadence is itself a signal — PyPI is improving these controls but the attacker beat them by 40 minutes
The detection engineer’s deliverable post-incident: lockfile-scanning rule that catches the specific versions plus the general pattern of out-of-cadence PyPI publications for security-sensitive packages.
The foundational cases (2022-2024)
JFrog Hugging Face disclosure (February 2024)
JFrog security researchers published a disclosure that approximately 100 malicious models had been uploaded to Hugging Face. The models contained pickle deserialization payloads — arbitrary Python code execution triggered when the model was loaded.
Mechanism: The Python pickle module is used to serialize Python objects, including model weights for many ML frameworks. Loading a pickle file is equivalent to executing the Python code embedded within it — by design. Adversaries embedded malicious payloads that execute on model load.
Hugging Face response: Accelerated the push to Safetensors (a new, executable-code-free format) and integrated Picklescan to automatically audit uploaded models for suspicious pickle payloads.
Lesson for defenders: Treat .pkl, .pt, .pth, and .h5 model files as untrusted code, not as data. Loading these formats is equivalent to running an arbitrary Python script — apply the same controls.
PyTorch torchtriton dependency confusion (December 2022)
A dependency-confusion attack against PyTorch’s nightly builds. A malicious package named torchtriton was uploaded to public PyPI with the same name as a dependency in PyTorch’s nightly build. Systems configured to pull from PyPI by default would download the malicious version.
Mechanism: Dependency confusion exploits the resolution order of package managers — public registries (PyPI) often take precedence over private registries, so an attacker who registers a package name on the public registry that matches a private package name will be served when the consumer requests the package.
Payload: The malicious package exfiltrated system information, environment variables, and files from the user’s home directory.
Lesson for defenders: Pin dependencies to specific versions and to specific package indices. Don’t rely on package-name uniqueness across registries — assume registry collision is a viable attack.
The model SBOM discipline
The defender’s structural response to ML supply-chain attacks is Software Bill of Materials (SBOM) for ML artifacts — a manifest of every model, dataset, and package in your stack, with provenance and signature data.
CycloneDX MLBOM
CycloneDX v1.5+ (the SBOM standard from OWASP) added MLBOM support: a standardized BOM format for ML models capturing training datasets, architecture, hyperparameters, and model card metadata.
Adoption is partial as of May 2026 but growing. Detection engineers should advocate for MLBOM emission from any LLM application running in their org.
Sigstore model-signing
Sigstore model-signing is an OpenSSF library for keyless signing of model weights with in-toto attestations. Provides cryptographically verifiable provenance — you can prove which build, by which signer, produced which model weight hash.
Adoption: early. The Hugging Face ecosystem is gradually adopting Sigstore-based attestations; the broader ML tooling ecosystem follows.
CoSAI (Coalition for Secure AI) Framework
CoSAI is an industry coalition publishing recommendations for tamper-proof model cards and signed metadata records. The framework is recommendation-level, not standardization-level — but represents emerging best practices.
The Codex-generated model SBOM tool
The implementation at .boss-pattern-work/day4/model_sbom.py (478 lines, stdlib-only — runs in air-gapped environments) inventories ML model artifacts in a target directory:
Features
- Scans a directory for ML model artifacts:
.safetensors,.pt,.pth,.pkl,.h5,.gguf,.onnx,model.json,config.json - Computes for each artifact:
- File path
- SHA-256 hash
- File size in bytes
- Format detection (heuristic by extension + magic bytes)
- Safety classification:
safetensors,gguf,onnx→safe_format.pkl,.h5,.pt,.pth→unsafe_format(pickle deserialization risk)- other →
unknown
- For HuggingFace-style directories (with
config.json): extracts model name, tokenizer presence, license file presence - Outputs a JSON manifest with timestamp, scan-host, all artifacts, and summary counts
- Warnings section flags:
unsafe_formatfiles- Files without paired
config.jsonor model card - Hashes matching a known-malicious list (provided via
--known-malicious-hashes path)
Example invocation
python3 model_sbom.py --dir /opt/models --output sbom.json --known-malicious-hashes ./known_bad.txt
Example output
{
"scan_timestamp": "2026-05-14T10:30:00Z",
"scan_host": "soc-workstation-42",
"scan_dir": "/opt/models",
"summary": {
"total_files": 47,
"safe_format_count": 35,
"unsafe_format_count": 8,
"unknown_count": 4
},
"artifacts": [
{
"path": "/opt/models/llama-3.1-8b/model.safetensors",
"sha256": "a3b8c9...",
"size_bytes": 16384091128,
"format": "safetensors",
"safety_class": "safe_format",
"huggingface_metadata": {
"model_name": "Llama-3.1-8B-Instruct",
"has_tokenizer": true,
"has_license": true
}
}
],
"warnings": [
{
"path": "/opt/models/legacy-model/weights.pkl",
"warning": "unsafe_format: pickle deserialization risk",
"severity": "high"
}
]
}
Deployment patterns
- Pre-deployment scan: before loading any new model in production, generate its SBOM entry and require approval through your existing change-management process
- Periodic inventory: weekly or monthly scan of all model storage locations; alert on new entries that haven’t gone through approval
- Incident response: when a malicious model is disclosed (like the JFrog 100), use the known-malicious-hashes feature to scan your existing inventory
Limitations of the heuristic approach
The Codex-generated tool is stdlib-only by design (runs in restricted environments). For more sophisticated scanning, layer it with:
- Picklescan — Hugging Face’s tool for scanning Python pickle files for malicious imports
- safetensors-scan — Hugging Face’s integrated scanner for verifying safetensors-format integrity
- Sigstore verification — verify model-signing attestations against your trusted-signer list
Other ML supply-chain incidents 2024-2026
Beyond the three major cases above, the detection engineer should track:
- EchoLeak (CVE-2025-32711) — covered Day 3 Module 3.4; supply-chain in the broad sense that the M365 Copilot infrastructure was vulnerable to crafted input
- HP Wolf AsyncRAT droppers (May 2025) — LLM-authored malware infiltrating standard malware distribution channels
- GGUF chat template metadata poisoning (Aug 2025) — adversaries embedding malicious instructions in GGUF model metadata (claim worth verifying at instructor’s discretion; the GGUF format itself is real and chat-template-metadata is a documented attack surface)
- nullifAI 7-Zip scanner evasion (Nov 2025) — adversaries using compression-format quirks to hide malicious payloads inside model files (verify specific incident at delivery)
The pattern: every new ML deployment surface generates a new supply-chain attack surface within 12-18 months of mainstream adoption.
Discussion questions (~10 min)
- The LiteLLM 1.82.7/1.82.8 packages were live on PyPI for 40 minutes. Walk through which controls would have caught a downstream victim during that 40-minute window. Could your org have caught it?
- Your org runs an internal Hugging Face mirror to allow developers to download models without direct internet access. Does this mirror help or hurt against the JFrog 100 incident? What additional control would close the gap?
- The Codex
model_sbom.pyis stdlib-only so it runs anywhere. What’s the trade-off vs deploying picklescan + safetensors-scan + Sigstore verification? When is each appropriate?
Common mistakes
| Mistake | Better approach |
|---|---|
Treating .pkl model files like data files | Treat them as arbitrary code; same controls as third-party scripts |
| No version pinning in CI/CD requirements | Pin to specific versions + specific package indices; lockfile is canonical |
| Manual model-inventory tracking | Automated SBOM generation; treat ML artifacts like any other software-composition concern |
| Trusting model card metadata at face value | Model cards are user-controlled content; verify provenance via cryptographic signing where available |
| Assuming “we use Safetensors only” makes us safe | Safetensors solves pickle-RCE but not all model-poisoning concerns; Module 4.5 covers fine-tune backdoors |
What’s next
Module 4.5 covers backdoored fine-tunes and sleeper-agent models — Anthropic’s Sleeper Agents research, behavioral evals as a CI gate, the hard truth that you cannot fully clear a third-party fine-tune through external evaluation alone.