What is Agent-as-a-Service pentesting?

Agent-as-a-Service (AaaS) pentesting lets you chat with autonomous pentesting agents that scan your application on demand. Instead of waiting weeks for a manual engagement, you talk to specialized agents — pen-scout (recon and surface mapping), pen-recon (deeper enumeration), pen-triage (validates and prioritizes findings), pen-fixer (remediation guidance), and pen-compliance (OWASP/standards mapping). Findings are validated with proof-of-concept and re-test, not just flagged. You start with 150 free credits, no credit card.

What are credits and how do they work?

Credits are how you run agent scans. Each scan or agent action consumes credits based on its depth. New accounts get 150 free credits with no card required. After that you can buy one-time credit packs ($29, $79, or $199) for pay-as-you-go use, or subscribe to a monthly tier ($49, $149, or $399/mo) for continuous, on-demand pentesting. The agents pen-scout, pen-recon, pen-triage, pen-fixer and pen-compliance all draw from the same credit balance.

Yes. Every new account gets 150 free credits with no credit card required — enough to chat with the pentesting agents and run real scans against your app. There is also a free security headers scan at sable.somoswilab.com/free-scan and a sample report at sable.somoswilab.com/sample-report. The free tier runs on OpenRouter models so you can evaluate the autonomous agents before paying.

What is penetration testing for startups?

Penetration testing (pentesting) is a simulated cyberattack against your application to find security vulnerabilities before real attackers do. For startups, we focus on the issues that matter most at your stage: authentication flaws, data exposure, API security, and common mistakes in modern stacks like Next.js, Supabase, and Firebase.

How much does a pentest cost?

Traditional pentests cost $10,000-$50,000+. SableOffensive starts at $29 for a Pre-Launch Check covering OWASP Top 10 and secrets detection. Founder Shield ($79) adds IDOR testing, auth bypass, and a debrief call. Scale Secure ($199) is a full-scope assessment. Every plan includes a professional report with remediation steps.

How long does a security scan take?

Pre-Launch Check reports are delivered within 24-48 hours. Founder Shield and Scale Secure may take 2-3 business days depending on the complexity of your application.

What do I need to provide?

At minimum, just your application URL. For more comprehensive testing, we may ask for staging credentials, API documentation, or GitHub repository access. We sign NDAs for all engagements.

What is OWASP Top 10?

OWASP Top 10 is the industry standard list of the most critical web application security risks. It includes injection attacks, broken authentication, cross-site scripting (XSS), server-side request forgery (SSRF), and security misconfigurations. Every SableOffensive assessment tests against the full OWASP Top 10.

Do you test AI-generated code?

Yes. Code generated by AI tools like Cursor, GitHub Copilot, and v0 often contains subtle security issues: hardcoded secrets, missing input validation, insecure API patterns, and overly permissive access controls. We have specific testing procedures for AI-generated codebases.

How do you secure Supabase and Firebase apps?

For Supabase, we audit Row Level Security (RLS) policies, test for direct table access, and check for exposed service keys. For Firebase, we review security rules, test Firestore/RTDB access patterns, and check Cloud Functions for vulnerabilities.

What if you find zero vulnerabilities?

50% money back guarantee. If our scan finds zero security issues, you get half your money back.

Is there a free pentesting option?

Yes. SableOffensive offers a free security headers scan at sable.somoswilab.com/free-scan. It instantly checks your website for 8 critical security headers (HSTS, CSP, X-Frame-Options, and more) and gives you an A-F grade with copy-paste fixes. No signup or payment required.

Can I get a free vulnerability scan?

Our free security headers check scans your website instantly and grades your security posture. For a deeper free assessment, contact us — we occasionally offer complimentary scans for early-stage startups and open source projects.

Ollama Model Loading RCE: Three Years of the Same Bug Class, One Self-Hosted LLM Runtime

TL;DR

Ollama has shipped three remote-code-execution vulnerabilities in three years, all at the model-loading boundary. CVE-2024-37032 ("Probllama", Wiz Research, 2024) was a path-traversal-to-RCE. The 0.1.37 ZipSlip bug in server/model.go was a classic untrusted-archive parsing failure. The most recent class — an Out-Of-Bounds Write in MLLAMA model metadata parsing affecting all versions before 0.7.0 — is the same primitive in C-style buffer territory. The pattern is structural: Ollama treats model files as data, but parses them as code. If you self-host an LLM runtime, this is the surface that matters.

The Bug Class: Model Files Are Code

An LLM model file is a packaged tensor format with metadata (architecture, vocab, tokenizer, layer shapes) plus weights. The metadata is parsed by the runtime to allocate buffers, set up tokenization, and route tensor data into compute graphs. Parsing is the operative word.

Every parser is a potential attack surface. When the runtime trusts the model file's metadata enough to allocate memory based on it, an attacker who controls the model file can:

Specify oversized vocab arrays → out-of-bounds writes in the runtime's vocab buffer.
Specify path-shaped strings in tokenizer or template fields → path traversal at open() calls.
Embed compressed payloads with directory traversal → ZipSlip-class file overwrites.
Forge tensor shape mismatches → integer overflows in buffer-size math.

The defensive posture is the same as parsing any untrusted binary format: validate every length, every offset, every path. Ollama, like most fast-moving AI infrastructure, has had to learn this in production.

Case 1: Out-Of-Bounds Write in MLLAMA Parsing (Pre-0.7.0)

The most recent Ollama RCE class affects all versions before 0.7.0 and lives in the parser for MLLAMA-format model files. The runtime reads model metadata — specifically array-typed fields whose declared length should drive a bounded allocation — without consistently validating the declared length against the buffer it's about to write into.

Concretely: a crafted MLLAMA file declares a metadata array of length N. The runtime allocates a buffer based on a related-but-not-equal-to-N value derived from another header field. The parser then loops N times into the smaller buffer. The result is an Out-Of-Bounds Write in user-controlled memory under user-controlled bytes — the canonical primitive for remote code execution.

The attack delivery is the part everyone underestimates. Ollama's primary use case is "pull a model from a registry". A malicious model uploaded to a public registry, or a typosquat of a popular model name, is the most realistic exploitation path. Anyone running ollama pull evil/llama-3 on a vulnerable build inherits the OOB Write.

Mitigation: upgrade Ollama to 0.7.0 or later. The patched parsers validate every length-bearing metadata field against the actual allocated buffer.

Case 2: CVE-2024-37032 "Probllama" (Wiz Research, 2024)

The 2024 disclosure remains the most-cited Ollama RCE because Wiz Research did the heavy lifting of writing it up clearly. The flaw: insufficient input validation in the API endpoint that handles model file paths allowed a path-traversal payload to escape Ollama's intended model directory and write to arbitrary filesystem locations.

From file overwrite, escalation to RCE was straightforward via standard Linux gadgets — overwrite a cron file, drop into ~/.bashrc, replace a binary on $PATH. Wiz's PoC chose the standard /etc/ld.so.preload trick.

What made Probllama notable wasn't the path-traversal class — that's old. It was the realization that self-hosted LLM runtimes are reachable from the public internet by default in many deployments. Ollama binds to 0.0.0.0:11434 in container deployments unless explicitly told otherwise. Pair that with a no-auth API surface and you have the same exposure profile as a 2010-era Redis instance.

Case 3: ZipSlip in server/model.go (v0.1.37)

Ollama's parseFromZipFile function in server/model.go at version 0.1.37 was vulnerable to the canonical ZipSlip pattern: when extracting a zip-packaged model, the code wrote each archive entry to disk using the entry's name without normalizing path separators or rejecting .. sequences. A malicious archive with ../../../etc/cron.d/backdoor as an entry name overwrites a privileged file.

This bug class has been in OWASP guidance for over a decade. ZipSlip is the lesson every parser developer learns once. Ollama learning it in production is part of the broader pattern: young infrastructure inherits old bug classes faster than mature infrastructure remembers them.

Threat Model: Where Does the Malicious Model Come From?

The bug is in the parser. The exploit path depends on whether you can deliver the malicious model. Three realistic vectors:

Public registry typosquat. An attacker uploads llama3-tiny as a misspelling of a popular model. Anyone who pulls by name without verifying the publisher gets the malicious file.
Compromised model repository. The trust is in the registry. If HuggingFace, Ollama's registry, or any internal mirror is compromised — even briefly — every ollama pull during the window is a potential exploit attempt.
Direct upload to a self-hosted instance. Many Ollama deployments expose the /api/create endpoint without authentication, allowing any reachable client to push a custom model. If your Ollama is on a corporate network, every internal compromise has a path to your model runtime.

Detection

1. Inventory: do you have unauthenticated Ollama exposed?

# From inside your network, find unauthenticated Ollama instances
nmap -p 11434 --open -sV $YOUR_INTERNAL_RANGES
# Or via Censys/Shodan for external exposure (paid, but worth the audit)
# shodan search 'product:Ollama'

Anything that responds is a candidate for an immediate audit. Ollama's / endpoint returns Ollama is running — easy to fingerprint, easy to find.

2. File-system anomaly detection on the Ollama host

The Probllama and ZipSlip classes write files outside the model directory. A Falco rule for unexpected writes by the Ollama process catches both:

- rule: Ollama writes outside model directory
  desc: Ollama process writing to a path outside its model storage
  condition: >
    open_write and proc.name = "ollama" and
    not fd.name pmatch ("/root/.ollama/*", "/usr/share/ollama/*", "/tmp/ollama-*", "/var/log/*")
  output: "Ollama writing outside model directory (proc=%proc.cmdline path=%fd.name)"
  priority: HIGH

3. Process tree post-load

If a model load triggers a child process — any child process — that's a strong signal of exploitation. Ollama's normal operation does not spawn unrelated processes during model parsing.

- rule: Ollama spawns unexpected child process
  desc: Ollama process executing anything other than known runtime helpers
  condition: >
    spawned_process and proc.aname[1] = "ollama" and
    not proc.name in (ollama, llama-server, ggml-runtime, dlopen)
  output: "Ollama spawned unexpected child (parent=%proc.aname[1] cmd=%proc.cmdline)"
  priority: CRITICAL

4. Network egress from the Ollama host

Post-RCE, attackers exfiltrate. Ollama's legitimate egress is the model registry only. Block-by-default outbound from the Ollama host with allowlisting catches both the immediate exfil and any persistence-installation traffic.

Mitigation

Patch: upgrade to Ollama 0.7.0 or later. This single change closes the MLLAMA parser class and includes the Probllama / ZipSlip patches from earlier releases.
Bind to localhost by default: set OLLAMA_HOST=127.0.0.1:11434 unless you've made an explicit, documented decision to expose the API. Container deployments often miss this.
Authenticate the API: Ollama itself has limited auth options; put a reverse proxy in front (nginx, Traefik, Caddy) that requires a token for every request. The same proxy can rate-limit model pulls and log every /api/create call.
Pin model sources: if you operate at scale, mirror models internally and pull only from your mirror. Revoke pull access to public registries from production hosts. Audit the mirror's source events.
Sandbox the runtime: run Ollama in a container with no privileges, no host volume mounts, dedicated user, restricted egress. The bug class is parser-level, but the blast radius is determined by what the runtime has access to.
Monitor model registry events: if you self-host a model registry (HuggingFace mirror, internal Modelfile repo), alert on tag changes and new uploads in your defensive analytics.

Frequently Asked Questions

Is upgrading to 0.7.0 enough?

Patches the known bug classes through April 2026. Doesn't solve the structural issue: model files remain attacker-controlled binary parsed inputs. Treat the parser as a security boundary going forward — if Ollama ships another model format, expect another parser-level CVE within the year.

Are managed Ollama services affected?

Depends on the service. Most managed inference platforms run a vetted set of models, sandbox the runtime per tenant, and don't expose /api/create. Their attack surface is the model registry's integrity, not the local parser. If you use Ollama Cloud, ask about their patch lag and tenant isolation.

Are other LLM runtimes safer?

vLLM had its own RCE class disclosed earlier in 2026. llama.cpp has historically had GGUF parser issues. Tabby and TGI have had their own parser CVEs. The pattern transcends Ollama; the surface is universal across self-hosted LLM runtimes. Ollama is just the most popular target.

If I'm running Ollama for AEGIS — what should I do?

If your AEGIS deployment uses Ollama for embedding or local model inference, audit the version (ollama --version), confirm 0.7.0 or later, restrict the listener to localhost, and ensure your AEGIS host has the egress restrictions described above. AEGIS itself doesn't introduce additional Ollama exposure, but its blast radius does include whatever the Ollama host can reach.

Key Takeaways

Three model-loading RCEs in three years across one runtime. The pattern is structural, not coincidental.
Patch to Ollama 0.7.0+ today. Verify, don't assume.
Bind to localhost; authenticate via reverse proxy. Default container deployments are too open.
Treat model files as untrusted binary input. Source from a trusted mirror or vetted publisher only.
Falco rules for unexpected file writes and child-process spawns are the highest-leverage runtime detections.

Sable Offensive Research conducts authorized assessments of self-hosted AI infrastructure: model registry trust review, runtime sandboxing audits, and tabletop exercises against the Probllama / ZipSlip / OOB-Write playbooks. Contact us if you operate Ollama, vLLM, or comparable runtimes at scale.

References

Wiz Research: Probllama disclosure and CVE-2024-37032 write-up
Sonar / Cybersecurity News: Out-Of-Bounds Write in MLLAMA parsing (pre-0.7.0)
Ollama security advisory: ZipSlip in server/model.go (v0.1.37)
OX Security: vLLM CVE-2026-22778 (parallel framework, related class)