Module 4.4 — Supply-Chain Compromise of ML Artifacts

50-minute lecture · Day 4 afternoon · Hands-on Python in the lab

Learning objectives

By end of this module, students can:

  1. Walk the LiteLLM/Mercor incident (March-April 2026) in technical detail — TeamPCP attacker, Trivy build-process compromise, malicious litellm 1.82.7/1.82.8 on PyPI, 4TB Mercor exfil, Meta pause
  2. Cite the JFrog Hugging Face disclosure (Feb 2024) and PyTorch torchtriton (Dec 2022) as the foundational ML supply-chain cases
  3. Apply the model SBOM discipline to inventory ML artifacts in their org — using the Codex-generated model_sbom.py tool plus industry frameworks (CycloneDX MLBOM, Sigstore model-signing)
  4. Identify the scanning-tool ecosystem (picklescan, safetensors-scan, Llama Guard, Azure Prompt Shields) and where each fits in the ML supply-chain defense layer

The structural shift

For decades, software supply-chain security has been a recognized but under-invested discipline. The 2020 SolarWinds incident raised executive awareness; the Log4Shell, MOVEit, and 3CX incidents reinforced it. By 2024, every major org had some supply-chain security investment.

ML artifacts are a new supply-chain frontier. A model weight file looks like data, not code. A Hugging Face model isn’t on the same procurement pipeline as a software vendor. A PyPI package that wraps an LLM API isn’t differentiated from any other PyPI package by most software-composition-analysis tools.

The 2024-2026 supply-chain incidents documented in this module demonstrate that the ML supply chain is now an active attack target. The defender’s discipline needs to extend: every component of your LLM stack — models, datasets, packages, tooling — is a potential injection vector.


The LiteLLM/Mercor incident (March-April 2026)

The most consequential ML supply-chain incident documented to date in this course.

The chain of compromise

  1. ~March 20, 2026: Threat actor group “TeamPCP” compromised the build process of Trivy (a popular open-source vulnerability scanner widely used in CI/CD). The compromise gave TeamPCP access to credentials stored in Trivy’s build environment.

  2. Token exfiltration: Among the credentials, TeamPCP extracted a PyPI publish token belonging to a maintainer of LiteLLM (a popular Python library for unifying LLM API calls — pip install litellm).

  3. March 24, 2026, 10:39 UTC: TeamPCP used the stolen token to publish malicious LiteLLM versions 1.82.7 and 1.82.8 directly to PyPI. The packages bypassed the official CI/CD pipeline; they were uploaded outside the normal release process.

  4. Live window: ~40 minutes. PyPI’s automated security scanning flagged the packages; they were quarantined within 40 minutes of publication. Anyone who ran pip install litellm during that window received a compromised version.

  5. The malicious payload: A credential stealer designed to harvest:

    • SSH keys
    • Cloud credentials (AWS, GCP, Azure)
    • Kubernetes secrets
    • API keys
    • Database credentials
  6. Exfiltration destination: Credentials were sent to models.litellm.cloud — a spoof domain that resembles a legitimate LiteLLM endpoint but is not controlled by LiteLLM. (Real LiteLLM uses official-domain endpoints; models.litellm.cloud is the attacker’s collection server.)

  7. Persistence (v1.82.8 only): The 1.82.8 version added a .pth file injection — Python pathway file modification that ensures the malicious code re-executes on every Python startup, even after the user “uninstalls” litellm.

The Mercor breach

Mercor is an AI hiring startup. They were a primary downstream victim:

The downstream impact

Sources

Why this matters for detection engineering

Detection signatures that would have caught LiteLLM 1.82.7/1.82.8 at scale:

  1. Lockfile scanning: pip freeze output or requirements.lock files containing litellm==1.82.7 or litellm==1.82.8 — alert in your CI/CD and developer workstation telemetry
  2. Outbound egress to models.litellm.cloud — non-standard domain; should not appear in any legitimate traffic from your LLM proxy infrastructure
  3. CI/CD pipeline integrity: monitoring the integrity of build-time dependencies — if Trivy’s own build process is compromised, downstream consumers of Trivy need a way to detect it
  4. PyPI package-publishing anomaly detection: publishing pattern outside the maintainer’s typical cadence is itself a signal — PyPI is improving these controls but the attacker beat them by 40 minutes

The detection engineer’s deliverable post-incident: lockfile-scanning rule that catches the specific versions plus the general pattern of out-of-cadence PyPI publications for security-sensitive packages.


The foundational cases (2022-2024)

JFrog Hugging Face disclosure (February 2024)

JFrog security researchers published a disclosure that approximately 100 malicious models had been uploaded to Hugging Face. The models contained pickle deserialization payloads — arbitrary Python code execution triggered when the model was loaded.

Mechanism: The Python pickle module is used to serialize Python objects, including model weights for many ML frameworks. Loading a pickle file is equivalent to executing the Python code embedded within it — by design. Adversaries embedded malicious payloads that execute on model load.

Hugging Face response: Accelerated the push to Safetensors (a new, executable-code-free format) and integrated Picklescan to automatically audit uploaded models for suspicious pickle payloads.

Lesson for defenders: Treat .pkl, .pt, .pth, and .h5 model files as untrusted code, not as data. Loading these formats is equivalent to running an arbitrary Python script — apply the same controls.

PyTorch torchtriton dependency confusion (December 2022)

A dependency-confusion attack against PyTorch’s nightly builds. A malicious package named torchtriton was uploaded to public PyPI with the same name as a dependency in PyTorch’s nightly build. Systems configured to pull from PyPI by default would download the malicious version.

Mechanism: Dependency confusion exploits the resolution order of package managers — public registries (PyPI) often take precedence over private registries, so an attacker who registers a package name on the public registry that matches a private package name will be served when the consumer requests the package.

Payload: The malicious package exfiltrated system information, environment variables, and files from the user’s home directory.

Lesson for defenders: Pin dependencies to specific versions and to specific package indices. Don’t rely on package-name uniqueness across registries — assume registry collision is a viable attack.


The model SBOM discipline

The defender’s structural response to ML supply-chain attacks is Software Bill of Materials (SBOM) for ML artifacts — a manifest of every model, dataset, and package in your stack, with provenance and signature data.

CycloneDX MLBOM

CycloneDX v1.5+ (the SBOM standard from OWASP) added MLBOM support: a standardized BOM format for ML models capturing training datasets, architecture, hyperparameters, and model card metadata.

Adoption is partial as of May 2026 but growing. Detection engineers should advocate for MLBOM emission from any LLM application running in their org.

Sigstore model-signing

Sigstore model-signing is an OpenSSF library for keyless signing of model weights with in-toto attestations. Provides cryptographically verifiable provenance — you can prove which build, by which signer, produced which model weight hash.

Adoption: early. The Hugging Face ecosystem is gradually adopting Sigstore-based attestations; the broader ML tooling ecosystem follows.

CoSAI (Coalition for Secure AI) Framework

CoSAI is an industry coalition publishing recommendations for tamper-proof model cards and signed metadata records. The framework is recommendation-level, not standardization-level — but represents emerging best practices.


The Codex-generated model SBOM tool

The implementation at .boss-pattern-work/day4/model_sbom.py (478 lines, stdlib-only — runs in air-gapped environments) inventories ML model artifacts in a target directory:

Features

Example invocation

python3 model_sbom.py --dir /opt/models --output sbom.json --known-malicious-hashes ./known_bad.txt

Example output

{
  "scan_timestamp": "2026-05-14T10:30:00Z",
  "scan_host": "soc-workstation-42",
  "scan_dir": "/opt/models",
  "summary": {
    "total_files": 47,
    "safe_format_count": 35,
    "unsafe_format_count": 8,
    "unknown_count": 4
  },
  "artifacts": [
    {
      "path": "/opt/models/llama-3.1-8b/model.safetensors",
      "sha256": "a3b8c9...",
      "size_bytes": 16384091128,
      "format": "safetensors",
      "safety_class": "safe_format",
      "huggingface_metadata": {
        "model_name": "Llama-3.1-8B-Instruct",
        "has_tokenizer": true,
        "has_license": true
      }
    }
  ],
  "warnings": [
    {
      "path": "/opt/models/legacy-model/weights.pkl",
      "warning": "unsafe_format: pickle deserialization risk",
      "severity": "high"
    }
  ]
}

Deployment patterns

Limitations of the heuristic approach

The Codex-generated tool is stdlib-only by design (runs in restricted environments). For more sophisticated scanning, layer it with:


Other ML supply-chain incidents 2024-2026

Beyond the three major cases above, the detection engineer should track:

The pattern: every new ML deployment surface generates a new supply-chain attack surface within 12-18 months of mainstream adoption.


Discussion questions (~10 min)

  1. The LiteLLM 1.82.7/1.82.8 packages were live on PyPI for 40 minutes. Walk through which controls would have caught a downstream victim during that 40-minute window. Could your org have caught it?
  2. Your org runs an internal Hugging Face mirror to allow developers to download models without direct internet access. Does this mirror help or hurt against the JFrog 100 incident? What additional control would close the gap?
  3. The Codex model_sbom.py is stdlib-only so it runs anywhere. What’s the trade-off vs deploying picklescan + safetensors-scan + Sigstore verification? When is each appropriate?

Common mistakes

MistakeBetter approach
Treating .pkl model files like data filesTreat them as arbitrary code; same controls as third-party scripts
No version pinning in CI/CD requirementsPin to specific versions + specific package indices; lockfile is canonical
Manual model-inventory trackingAutomated SBOM generation; treat ML artifacts like any other software-composition concern
Trusting model card metadata at face valueModel cards are user-controlled content; verify provenance via cryptographic signing where available
Assuming “we use Safetensors only” makes us safeSafetensors solves pickle-RCE but not all model-poisoning concerns; Module 4.5 covers fine-tune backdoors

What’s next

Module 4.5 covers backdoored fine-tunes and sleeper-agent models — Anthropic’s Sleeper Agents research, behavioral evals as a CI gate, the hard truth that you cannot fully clear a third-party fine-tune through external evaluation alone.