Module 4.4 — Supply-Chain Compromise of ML Artifacts

50-minute lecture · Day 4 afternoon · Hands-on Python in the lab

Learning objectives

By end of this module, students can:

Walk the LiteLLM/Mercor incident (March-April 2026) in technical detail — TeamPCP attacker, Trivy build-process compromise, malicious litellm 1.82.7/1.82.8 on PyPI, 4TB Mercor exfil, Meta pause
Cite the JFrog Hugging Face disclosure (Feb 2024) and PyTorch torchtriton (Dec 2022) as the foundational ML supply-chain cases
Apply the model SBOM discipline to inventory ML artifacts in their org — using the Codex-generated model_sbom.py tool plus industry frameworks (CycloneDX MLBOM, Sigstore model-signing)
Identify the scanning-tool ecosystem (picklescan, safetensors-scan, Llama Guard, Azure Prompt Shields) and where each fits in the ML supply-chain defense layer

The structural shift

For decades, software supply-chain security has been a recognized but under-invested discipline. The 2020 SolarWinds incident raised executive awareness; the Log4Shell, MOVEit, and 3CX incidents reinforced it. By 2024, every major org had some supply-chain security investment.

ML artifacts are a new supply-chain frontier. A model weight file looks like data, not code. A Hugging Face model isn’t on the same procurement pipeline as a software vendor. A PyPI package that wraps an LLM API isn’t differentiated from any other PyPI package by most software-composition-analysis tools.

The 2024-2026 supply-chain incidents documented in this module demonstrate that the ML supply chain is now an active attack target. The defender’s discipline needs to extend: every component of your LLM stack — models, datasets, packages, tooling — is a potential injection vector.

The LiteLLM/Mercor incident (March-April 2026)

The most consequential ML supply-chain incident documented to date in this course.

The chain of compromise

~March 20, 2026: Threat actor group “TeamPCP” compromised the build process of Trivy (a popular open-source vulnerability scanner widely used in CI/CD). The compromise gave TeamPCP access to credentials stored in Trivy’s build environment.
Token exfiltration: Among the credentials, TeamPCP extracted a PyPI publish token belonging to a maintainer of LiteLLM (a popular Python library for unifying LLM API calls — pip install litellm).
March 24, 2026, 10:39 UTC: TeamPCP used the stolen token to publish malicious LiteLLM versions 1.82.7 and 1.82.8 directly to PyPI. The packages bypassed the official CI/CD pipeline; they were uploaded outside the normal release process.
Live window: ~40 minutes. PyPI’s automated security scanning flagged the packages; they were quarantined within 40 minutes of publication. Anyone who ran pip install litellm during that window received a compromised version.
The malicious payload: A credential stealer designed to harvest:
- SSH keys
- Cloud credentials (AWS, GCP, Azure)
- Kubernetes secrets
- API keys
- Database credentials
Exfiltration destination: Credentials were sent to models.litellm.cloud — a spoof domain that resembles a legitimate LiteLLM endpoint but is not controlled by LiteLLM. (Real LiteLLM uses official-domain endpoints; models.litellm.cloud is the attacker’s collection server.)
Persistence (v1.82.8 only): The 1.82.8 version added a .pth file injection — Python pathway file modification that ensures the malicious code re-executes on every Python startup, even after the user “uninstalls” litellm.

The Mercor breach

Mercor is an AI hiring startup. They were a primary downstream victim:

Date of breach disclosure: April 1, 2026 (Mercor’s public statement)
Total exfiltrated: Approximately 4 terabytes of data
Specific components:
- ~939 GB of platform source code
- ~211 GB of user database (account records, communications, hiring data)
- ~3 TB of video interview recordings and identity-verification documents (including passport scans) for 40,000+ contractors on Mercor’s platform

The downstream impact

Meta paused all Mercor contracts following the breach disclosure (Meta was a significant Mercor customer)
A class action lawsuit was filed against Mercor on behalf of affected contractors
The incident has been widely covered as the most consequential ML-supply-chain attack to date

Sources

TechCrunch reporting on the incident (March 31, 2026)
The Register, “Malicious LiteLLM packages on PyPI lead to massive Mercor data breach” (April 2, 2026)
LiteLLM’s official security disclosure at docs.litellm.ai/blog/security-update-march-2026
Hackread, CyberSecurityNews, and other security publications

Why this matters for detection engineering

Detection signatures that would have caught LiteLLM 1.82.7/1.82.8 at scale:

Lockfile scanning: pip freeze output or requirements.lock files containing litellm==1.82.7 or litellm==1.82.8 — alert in your CI/CD and developer workstation telemetry
Outbound egress to models.litellm.cloud — non-standard domain; should not appear in any legitimate traffic from your LLM proxy infrastructure
CI/CD pipeline integrity: monitoring the integrity of build-time dependencies — if Trivy’s own build process is compromised, downstream consumers of Trivy need a way to detect it
PyPI package-publishing anomaly detection: publishing pattern outside the maintainer’s typical cadence is itself a signal — PyPI is improving these controls but the attacker beat them by 40 minutes

The detection engineer’s deliverable post-incident: lockfile-scanning rule that catches the specific versions plus the general pattern of out-of-cadence PyPI publications for security-sensitive packages.

The foundational cases (2022-2024)

JFrog Hugging Face disclosure (February 2024)

JFrog security researchers published a disclosure that approximately 100 malicious models had been uploaded to Hugging Face. The models contained pickle deserialization payloads — arbitrary Python code execution triggered when the model was loaded.

Mechanism: The Python pickle module is used to serialize Python objects, including model weights for many ML frameworks. Loading a pickle file is equivalent to executing the Python code embedded within it — by design. Adversaries embedded malicious payloads that execute on model load.

Hugging Face response: Accelerated the push to Safetensors (a new, executable-code-free format) and integrated Picklescan to automatically audit uploaded models for suspicious pickle payloads.

Lesson for defenders: Treat .pkl, .pt, .pth, and .h5 model files as untrusted code, not as data. Loading these formats is equivalent to running an arbitrary Python script — apply the same controls.

PyTorch torchtriton dependency confusion (December 2022)

A dependency-confusion attack against PyTorch’s nightly builds. A malicious package named torchtriton was uploaded to public PyPI with the same name as a dependency in PyTorch’s nightly build. Systems configured to pull from PyPI by default would download the malicious version.

Mechanism: Dependency confusion exploits the resolution order of package managers — public registries (PyPI) often take precedence over private registries, so an attacker who registers a package name on the public registry that matches a private package name will be served when the consumer requests the package.

Payload: The malicious package exfiltrated system information, environment variables, and files from the user’s home directory.

Lesson for defenders: Pin dependencies to specific versions and to specific package indices. Don’t rely on package-name uniqueness across registries — assume registry collision is a viable attack.

The model SBOM discipline

The defender’s structural response to ML supply-chain attacks is Software Bill of Materials (SBOM) for ML artifacts — a manifest of every model, dataset, and package in your stack, with provenance and signature data.

CycloneDX MLBOM

CycloneDX v1.5+ (the SBOM standard from OWASP) added MLBOM support: a standardized BOM format for ML models capturing training datasets, architecture, hyperparameters, and model card metadata.

Adoption is partial as of May 2026 but growing. Detection engineers should advocate for MLBOM emission from any LLM application running in their org.

Sigstore model-signing

Sigstore model-signing is an OpenSSF library for keyless signing of model weights with in-toto attestations. Provides cryptographically verifiable provenance — you can prove which build, by which signer, produced which model weight hash.

Adoption: early. The Hugging Face ecosystem is gradually adopting Sigstore-based attestations; the broader ML tooling ecosystem follows.

CoSAI (Coalition for Secure AI) Framework

CoSAI is an industry coalition publishing recommendations for tamper-proof model cards and signed metadata records. The framework is recommendation-level, not standardization-level — but represents emerging best practices.

The Codex-generated model SBOM tool

The implementation at .boss-pattern-work/day4/model_sbom.py (478 lines, stdlib-only — runs in air-gapped environments) inventories ML model artifacts in a target directory:

Features

Scans a directory for ML model artifacts: .safetensors, .pt, .pth, .pkl, .h5, .gguf, .onnx, model.json, config.json
Computes for each artifact:
- File path
- SHA-256 hash
- File size in bytes
- Format detection (heuristic by extension + magic bytes)
- Safety classification:
  - safetensors, gguf, onnx → safe_format
  - .pkl, .h5, .pt, .pth → unsafe_format (pickle deserialization risk)
  - other → unknown
For HuggingFace-style directories (with config.json): extracts model name, tokenizer presence, license file presence
Outputs a JSON manifest with timestamp, scan-host, all artifacts, and summary counts
Warnings section flags:
- unsafe_format files
- Files without paired config.json or model card
- Hashes matching a known-malicious list (provided via --known-malicious-hashes path)

Example invocation

python3 model_sbom.py --dir /opt/models --output sbom.json --known-malicious-hashes ./known_bad.txt

Example output

{
  "scan_timestamp": "2026-05-14T10:30:00Z",
  "scan_host": "soc-workstation-42",
  "scan_dir": "/opt/models",
  "summary": {
    "total_files": 47,
    "safe_format_count": 35,
    "unsafe_format_count": 8,
    "unknown_count": 4
  },
  "artifacts": [
    {
      "path": "/opt/models/llama-3.1-8b/model.safetensors",
      "sha256": "a3b8c9...",
      "size_bytes": 16384091128,
      "format": "safetensors",
      "safety_class": "safe_format",
      "huggingface_metadata": {
        "model_name": "Llama-3.1-8B-Instruct",
        "has_tokenizer": true,
        "has_license": true
      }
    }
  ],
  "warnings": [
    {
      "path": "/opt/models/legacy-model/weights.pkl",
      "warning": "unsafe_format: pickle deserialization risk",
      "severity": "high"
    }
  ]
}

Deployment patterns

Pre-deployment scan: before loading any new model in production, generate its SBOM entry and require approval through your existing change-management process
Periodic inventory: weekly or monthly scan of all model storage locations; alert on new entries that haven’t gone through approval
Incident response: when a malicious model is disclosed (like the JFrog 100), use the known-malicious-hashes feature to scan your existing inventory

Limitations of the heuristic approach

The Codex-generated tool is stdlib-only by design (runs in restricted environments). For more sophisticated scanning, layer it with:

Picklescan — Hugging Face’s tool for scanning Python pickle files for malicious imports
safetensors-scan — Hugging Face’s integrated scanner for verifying safetensors-format integrity
Sigstore verification — verify model-signing attestations against your trusted-signer list

Other ML supply-chain incidents 2024-2026

Beyond the three major cases above, the detection engineer should track:

EchoLeak (CVE-2025-32711) — covered Day 3 Module 3.4; supply-chain in the broad sense that the M365 Copilot infrastructure was vulnerable to crafted input
HP Wolf AsyncRAT droppers (May 2025) — LLM-authored malware infiltrating standard malware distribution channels
GGUF chat template metadata poisoning (Aug 2025) — adversaries embedding malicious instructions in GGUF model metadata (claim worth verifying at instructor’s discretion; the GGUF format itself is real and chat-template-metadata is a documented attack surface)
nullifAI 7-Zip scanner evasion (Nov 2025) — adversaries using compression-format quirks to hide malicious payloads inside model files (verify specific incident at delivery)

The pattern: every new ML deployment surface generates a new supply-chain attack surface within 12-18 months of mainstream adoption.

Discussion questions (~10 min)

The LiteLLM 1.82.7/1.82.8 packages were live on PyPI for 40 minutes. Walk through which controls would have caught a downstream victim during that 40-minute window. Could your org have caught it?
Your org runs an internal Hugging Face mirror to allow developers to download models without direct internet access. Does this mirror help or hurt against the JFrog 100 incident? What additional control would close the gap?
The Codex model_sbom.py is stdlib-only so it runs anywhere. What’s the trade-off vs deploying picklescan + safetensors-scan + Sigstore verification? When is each appropriate?

Common mistakes

Mistake	Better approach
Treating `.pkl` model files like data files	Treat them as arbitrary code; same controls as third-party scripts
No version pinning in CI/CD requirements	Pin to specific versions + specific package indices; lockfile is canonical
Manual model-inventory tracking	Automated SBOM generation; treat ML artifacts like any other software-composition concern
Trusting model card metadata at face value	Model cards are user-controlled content; verify provenance via cryptographic signing where available
Assuming “we use Safetensors only” makes us safe	Safetensors solves pickle-RCE but not all model-poisoning concerns; Module 4.5 covers fine-tune backdoors

What’s next

Module 4.5 covers backdoored fine-tunes and sleeper-agent models — Anthropic’s Sleeper Agents research, behavioral evals as a CI gate, the hard truth that you cannot fully clear a third-party fine-tune through external evaluation alone.