TL;DR
Ollama has shipped three remote-code-execution vulnerabilities in three years, all at the model-loading boundary. CVE-2024-37032 ("Probllama", Wiz Research, 2024) was a path-traversal-to-RCE. The 0.1.37 ZipSlip bug in server/model.go was a classic untrusted-archive parsing failure. The most recent class — an Out-Of-Bounds Write in MLLAMA model metadata parsing affecting all versions before 0.7.0 — is the same primitive in C-style buffer territory. The pattern is structural: Ollama treats model files as data, but parses them as code. If you self-host an LLM runtime, this is the surface that matters.
The Bug Class: Model Files Are Code
An LLM model file is a packaged tensor format with metadata (architecture, vocab, tokenizer, layer shapes) plus weights. The metadata is parsed by the runtime to allocate buffers, set up tokenization, and route tensor data into compute graphs. Parsing is the operative word.
Every parser is a potential attack surface. When the runtime trusts the model file's metadata enough to allocate memory based on it, an attacker who controls the model file can:
- Specify oversized vocab arrays → out-of-bounds writes in the runtime's vocab buffer.
- Specify path-shaped strings in tokenizer or template fields → path traversal at
open()calls. - Embed compressed payloads with directory traversal → ZipSlip-class file overwrites.
- Forge tensor shape mismatches → integer overflows in buffer-size math.
The defensive posture is the same as parsing any untrusted binary format: validate every length, every offset, every path. Ollama, like most fast-moving AI infrastructure, has had to learn this in production.
Case 1: Out-Of-Bounds Write in MLLAMA Parsing (Pre-0.7.0)
The most recent Ollama RCE class affects all versions before 0.7.0 and lives in the parser for MLLAMA-format model files. The runtime reads model metadata — specifically array-typed fields whose declared length should drive a bounded allocation — without consistently validating the declared length against the buffer it's about to write into.
Concretely: a crafted MLLAMA file declares a metadata array of length N. The runtime allocates a buffer based on a related-but-not-equal-to-N value derived from another header field. The parser then loops N times into the smaller buffer. The result is an Out-Of-Bounds Write in user-controlled memory under user-controlled bytes — the canonical primitive for remote code execution.
The attack delivery is the part everyone underestimates. Ollama's primary use case is "pull a model from a registry". A malicious model uploaded to a public registry, or a typosquat of a popular model name, is the most realistic exploitation path. Anyone running ollama pull evil/llama-3 on a vulnerable build inherits the OOB Write.
Mitigation: upgrade Ollama to 0.7.0 or later. The patched parsers validate every length-bearing metadata field against the actual allocated buffer.
Case 2: CVE-2024-37032 "Probllama" (Wiz Research, 2024)
The 2024 disclosure remains the most-cited Ollama RCE because Wiz Research did the heavy lifting of writing it up clearly. The flaw: insufficient input validation in the API endpoint that handles model file paths allowed a path-traversal payload to escape Ollama's intended model directory and write to arbitrary filesystem locations.
From file overwrite, escalation to RCE was straightforward via standard Linux gadgets — overwrite a cron file, drop into ~/.bashrc, replace a binary on $PATH. Wiz's PoC chose the standard /etc/ld.so.preload trick.
What made Probllama notable wasn't the path-traversal class — that's old. It was the realization that self-hosted LLM runtimes are reachable from the public internet by default in many deployments. Ollama binds to 0.0.0.0:11434 in container deployments unless explicitly told otherwise. Pair that with a no-auth API surface and you have the same exposure profile as a 2010-era Redis instance.
Case 3: ZipSlip in server/model.go (v0.1.37)
Ollama's parseFromZipFile function in server/model.go at version 0.1.37 was vulnerable to the canonical ZipSlip pattern: when extracting a zip-packaged model, the code wrote each archive entry to disk using the entry's name without normalizing path separators or rejecting .. sequences. A malicious archive with ../../../etc/cron.d/backdoor as an entry name overwrites a privileged file.
This bug class has been in OWASP guidance for over a decade. ZipSlip is the lesson every parser developer learns once. Ollama learning it in production is part of the broader pattern: young infrastructure inherits old bug classes faster than mature infrastructure remembers them.
Threat Model: Where Does the Malicious Model Come From?
The bug is in the parser. The exploit path depends on whether you can deliver the malicious model. Three realistic vectors:
- Public registry typosquat. An attacker uploads
llama3-tinyas a misspelling of a popular model. Anyone who pulls by name without verifying the publisher gets the malicious file. - Compromised model repository. The trust is in the registry. If HuggingFace, Ollama's registry, or any internal mirror is compromised — even briefly — every
ollama pullduring the window is a potential exploit attempt. - Direct upload to a self-hosted instance. Many Ollama deployments expose the
/api/createendpoint without authentication, allowing any reachable client to push a custom model. If your Ollama is on a corporate network, every internal compromise has a path to your model runtime.
Detection
1. Inventory: do you have unauthenticated Ollama exposed?
# From inside your network, find unauthenticated Ollama instances
nmap -p 11434 --open -sV $YOUR_INTERNAL_RANGES
# Or via Censys/Shodan for external exposure (paid, but worth the audit)
# shodan search 'product:Ollama'
Anything that responds is a candidate for an immediate audit. Ollama's / endpoint returns Ollama is running — easy to fingerprint, easy to find.
2. File-system anomaly detection on the Ollama host
The Probllama and ZipSlip classes write files outside the model directory. A Falco rule for unexpected writes by the Ollama process catches both:
- rule: Ollama writes outside model directory
desc: Ollama process writing to a path outside its model storage
condition: >
open_write and proc.name = "ollama" and
not fd.name pmatch ("/root/.ollama/*", "/usr/share/ollama/*", "/tmp/ollama-*", "/var/log/*")
output: "Ollama writing outside model directory (proc=%proc.cmdline path=%fd.name)"
priority: HIGH
3. Process tree post-load
If a model load triggers a child process — any child process — that's a strong signal of exploitation. Ollama's normal operation does not spawn unrelated processes during model parsing.
- rule: Ollama spawns unexpected child process
desc: Ollama process executing anything other than known runtime helpers
condition: >
spawned_process and proc.aname[1] = "ollama" and
not proc.name in (ollama, llama-server, ggml-runtime, dlopen)
output: "Ollama spawned unexpected child (parent=%proc.aname[1] cmd=%proc.cmdline)"
priority: CRITICAL
4. Network egress from the Ollama host
Post-RCE, attackers exfiltrate. Ollama's legitimate egress is the model registry only. Block-by-default outbound from the Ollama host with allowlisting catches both the immediate exfil and any persistence-installation traffic.
Mitigation
- Patch: upgrade to Ollama 0.7.0 or later. This single change closes the MLLAMA parser class and includes the Probllama / ZipSlip patches from earlier releases.
- Bind to localhost by default: set
OLLAMA_HOST=127.0.0.1:11434unless you've made an explicit, documented decision to expose the API. Container deployments often miss this. - Authenticate the API: Ollama itself has limited auth options; put a reverse proxy in front (nginx, Traefik, Caddy) that requires a token for every request. The same proxy can rate-limit model pulls and log every
/api/createcall. - Pin model sources: if you operate at scale, mirror models internally and pull only from your mirror. Revoke pull access to public registries from production hosts. Audit the mirror's source events.
- Sandbox the runtime: run Ollama in a container with no privileges, no host volume mounts, dedicated user, restricted egress. The bug class is parser-level, but the blast radius is determined by what the runtime has access to.
- Monitor model registry events: if you self-host a model registry (HuggingFace mirror, internal Modelfile repo), alert on tag changes and new uploads in your defensive analytics.
Frequently Asked Questions
Is upgrading to 0.7.0 enough?
Patches the known bug classes through April 2026. Doesn't solve the structural issue: model files remain attacker-controlled binary parsed inputs. Treat the parser as a security boundary going forward — if Ollama ships another model format, expect another parser-level CVE within the year.
Are managed Ollama services affected?
Depends on the service. Most managed inference platforms run a vetted set of models, sandbox the runtime per tenant, and don't expose /api/create. Their attack surface is the model registry's integrity, not the local parser. If you use Ollama Cloud, ask about their patch lag and tenant isolation.
Are other LLM runtimes safer?
vLLM had its own RCE class disclosed earlier in 2026. llama.cpp has historically had GGUF parser issues. Tabby and TGI have had their own parser CVEs. The pattern transcends Ollama; the surface is universal across self-hosted LLM runtimes. Ollama is just the most popular target.
If I'm running Ollama for AEGIS — what should I do?
If your AEGIS deployment uses Ollama for embedding or local model inference, audit the version (ollama --version), confirm 0.7.0 or later, restrict the listener to localhost, and ensure your AEGIS host has the egress restrictions described above. AEGIS itself doesn't introduce additional Ollama exposure, but its blast radius does include whatever the Ollama host can reach.
Key Takeaways
- Three model-loading RCEs in three years across one runtime. The pattern is structural, not coincidental.
- Patch to Ollama 0.7.0+ today. Verify, don't assume.
- Bind to localhost; authenticate via reverse proxy. Default container deployments are too open.
- Treat model files as untrusted binary input. Source from a trusted mirror or vetted publisher only.
- Falco rules for unexpected file writes and child-process spawns are the highest-leverage runtime detections.
Sable Offensive Research conducts authorized assessments of self-hosted AI infrastructure: model registry trust review, runtime sandboxing audits, and tabletop exercises against the Probllama / ZipSlip / OOB-Write playbooks. Contact us if you operate Ollama, vLLM, or comparable runtimes at scale.
References
- Wiz Research: Probllama disclosure and CVE-2024-37032 write-up
- Sonar / Cybersecurity News: Out-Of-Bounds Write in MLLAMA parsing (pre-0.7.0)
- Ollama security advisory: ZipSlip in
server/model.go(v0.1.37) - OX Security: vLLM CVE-2026-22778 (parallel framework, related class)