What is Agent-as-a-Service pentesting?

Agent-as-a-Service (AaaS) pentesting lets you chat with autonomous pentesting agents that scan your application on demand. Instead of waiting weeks for a manual engagement, you talk to specialized agents — pen-scout (recon and surface mapping), pen-recon (deeper enumeration), pen-triage (validates and prioritizes findings), pen-fixer (remediation guidance), and pen-compliance (OWASP/standards mapping). Findings are validated with proof-of-concept and re-test, not just flagged. You start with 150 free credits, no credit card.

What are credits and how do they work?

Credits are how you run agent scans. Each scan or agent action consumes credits based on its depth. New accounts get 150 free credits with no card required. After that you can buy one-time credit packs ($29, $79, or $199) for pay-as-you-go use, or subscribe to a monthly tier ($49, $149, or $399/mo) for continuous, on-demand pentesting. The agents pen-scout, pen-recon, pen-triage, pen-fixer and pen-compliance all draw from the same credit balance.

Yes. Every new account gets 150 free credits with no credit card required — enough to chat with the pentesting agents and run real scans against your app. There is also a free security headers scan at sable.somoswilab.com/free-scan and a sample report at sable.somoswilab.com/sample-report. The free tier runs on OpenRouter models so you can evaluate the autonomous agents before paying.

What is penetration testing for startups?

Penetration testing (pentesting) is a simulated cyberattack against your application to find security vulnerabilities before real attackers do. For startups, we focus on the issues that matter most at your stage: authentication flaws, data exposure, API security, and common mistakes in modern stacks like Next.js, Supabase, and Firebase.

How much does a pentest cost?

Traditional pentests cost $10,000-$50,000+. SableOffensive starts at $29 for a Pre-Launch Check covering OWASP Top 10 and secrets detection. Founder Shield ($79) adds IDOR testing, auth bypass, and a debrief call. Scale Secure ($199) is a full-scope assessment. Every plan includes a professional report with remediation steps.

How long does a security scan take?

Pre-Launch Check reports are delivered within 24-48 hours. Founder Shield and Scale Secure may take 2-3 business days depending on the complexity of your application.

What do I need to provide?

At minimum, just your application URL. For more comprehensive testing, we may ask for staging credentials, API documentation, or GitHub repository access. We sign NDAs for all engagements.

What is OWASP Top 10?

OWASP Top 10 is the industry standard list of the most critical web application security risks. It includes injection attacks, broken authentication, cross-site scripting (XSS), server-side request forgery (SSRF), and security misconfigurations. Every SableOffensive assessment tests against the full OWASP Top 10.

Do you test AI-generated code?

Yes. Code generated by AI tools like Cursor, GitHub Copilot, and v0 often contains subtle security issues: hardcoded secrets, missing input validation, insecure API patterns, and overly permissive access controls. We have specific testing procedures for AI-generated codebases.

How do you secure Supabase and Firebase apps?

For Supabase, we audit Row Level Security (RLS) policies, test for direct table access, and check for exposed service keys. For Firebase, we review security rules, test Firestore/RTDB access patterns, and check Cloud Functions for vulnerabilities.

What if you find zero vulnerabilities?

50% money back guarantee. If our scan finds zero security issues, you get half your money back.

Is there a free pentesting option?

Yes. SableOffensive offers a free security headers scan at sable.somoswilab.com/free-scan. It instantly checks your website for 8 critical security headers (HSTS, CSP, X-Frame-Options, and more) and gives you an A-F grade with copy-paste fixes. No signup or payment required.

Can I get a free vulnerability scan?

Our free security headers check scans your website instantly and grades your security posture. For a deeper free assessment, contact us — we occasionally offer complimentary scans for early-stage startups and open source projects.

The MCP Confused Deputy: Provenance Gaps, Instruction Injection, and DNS Rebinding in the Model Context Protocol

The Model Context Protocol is the connective tissue between LLMs and external systems — filesystems, databases, APIs, browsers, internal tooling. In a well-functioning agentic deployment, MCP lets a model read documents, query databases, and take actions in the world. In an adversarial deployment, MCP becomes the mechanism by which attacker-controlled content reaches a model's context window and convinces it to take actions the user never authorized.

We previously covered the RCE class of MCP vulnerability and AEGIS detection at the network layer (Anthropic MCP RCE: 7,000 Servers Exposed). This piece is about a different and subtler problem: the confused deputy — the provenance gap in the protocol design itself — and the specific attack paths that flow from it.

What Is the Confused Deputy Problem in MCP?

The confused deputy is a classic access-control concept: an entity with legitimate authority is manipulated by an untrusted party into exercising that authority on the untrusted party's behalf. In MCP, the confused deputy is the model itself.

Here is the structural problem: when an MCP tool call returns a result, the protocol carries the content of that result into the model's context window. It does not carry cryptographically attestable information about where that content came from, who produced it, or whether it has been tampered with in transit. The model receives a tool_result and must decide how to act on it based on its content — but the content could come from a trusted server, a compromised server, a DNS-rebinded endpoint, or a webpage that an attacker controls and has stuffed with instruction text designed to override the model's behavior.

The MCP specification includes a ToolAnnotations object — fields like readOnlyHint, destructiveHint, idempotentHint — intended to communicate properties of tools to clients. The specification is explicit that these annotations are advisory and not security boundaries. They can be set by the server and have no verifiable binding to actual tool behavior. A malicious or compromised server can annotate a destructive write operation as readOnly. An instruction-injected tool result can include fabricated annotations in its text. Neither the client nor the model can verify them cryptographically.

The result is a protocol where the model is asked to make trust decisions about content for which the protocol provides no trustworthy provenance signal.

The Official Fetch Server: Attacker-Controlled Markdown Straight to Context

The reference MCP fetch server — the official Anthropic-maintained server for web browsing tool calls — accepts a URL, fetches the content, converts the HTML to Markdown, and returns it as a tool_result. The model receives this Markdown and processes it as part of its context.

Consider what happens when the URL being fetched is attacker-controlled. The attacker can place arbitrary text in the page, including:

Instructions formatted to look like system-prompt or tool-definition content
Text that references other tool names in the model's context and instructs the model to call them with specific arguments
Content that mimics the format of legitimate tool results to build false context
Instructions to exfiltrate information from context by encoding it in a follow-up URL fetch

This is not a hypothetical. It is the standard indirect prompt injection pattern applied to the MCP fetch surface. The fetch server converts web content to Markdown without sanitizing or labeling it as untrusted external content — it lands in the model's context with the same structural weight as system instructions or legitimate tool results. The model has no reliable mechanism to distinguish "this is content I fetched from an external page" from "this is an instruction from my operator."

The attack does not require compromising an MCP server. It requires only the ability to serve content at a URL the model will be asked to fetch — which in many agentic workflows is directly controlled by user input or by attacker-controlled content in previously fetched documents.

The DNS Rebinding Footgun

DNS rebinding attacks against MCP create a distinct and particularly dangerous attack path. The mechanism:

An attacker registers a domain and configures a DNS server that initially resolves it to a public IP.
The MCP fetch server checks the URL — it resolves to a public IP, passes any allowlist or block check, and proceeds.
The attacker's DNS server drops the TTL to zero and rebinds the domain to an internal IP address — 192.168.x.x, 10.x.x.x, or 172.16-31.x.x — typically the IP of an internal service that the MCP server can reach from its network position.
When the MCP server makes the HTTP request, the DNS lookup now resolves to the internal IP. The fetch server connects to the internal service and returns its content to the model.

This bypasses most URL-based filtering because the filtering happens at allowlist check time (public IP), not at fetch time (internal IP). The MCP server becomes an involuntary proxy to internal services, and the attacker now has the model reading the output of those internal services.

The footgun is particularly sharp in self-hosted agentic deployments where the MCP server runs inside the network perimeter and has access to internal APIs, documentation systems, internal dashboards, or configuration management endpoints that are not exposed publicly.

Detection Rules

The following are concrete, loggable signals for MCP deployments. These should be implemented at the MCP proxy layer, in SIEM rules, or in the agent monitoring infrastructure.

Instruction injection detection:

Rule: Tool result content contains system-prompt-like structure. Alert when a tool_result payload contains strings that match system-prompt formatting patterns: XML-like role tags (<system>, <instructions>, [INST]), markdown headers framing instructions (## New Instructions, ## Override), or explicit references to model roles. These patterns appearing in tool results, not in system prompts, indicate injection attempts.
Rule: Tool result references other tool names with arguments. Alert when the text content of a tool_result includes strings that match tool names registered in the current MCP session, especially when followed by argument-like structures. A webpage result that says "now call the write_file tool with the following content" is an injection attempt, not legitimate page content.
Rule: Tool result size exceeds threshold relative to request. Large tool results — particularly from fetch operations on relatively simple URLs — may indicate content stuffed with injection text. Alert and human-review above a configurable size threshold for external-URL fetch results.

DNS rebinding detection:

Rule: Destination IP of MCP fetch resolves to RFC1918 space. Log the resolved IP address at the time of the HTTP connection, not at URL parse time. Alert when a fetch to a non-private hostname (not .local, not explicit RFC1918) results in a TCP connection to 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16. This is the rebinding signal.
Rule: DNS TTL below threshold on MCP-fetched domains. Log TTL values for all DNS resolutions made by the MCP fetch server. A TTL of zero or below 30 seconds on a non-CDN domain is a rebinding preparation signal. Alert for manual review before the fetch proceeds.
Rule: HTTP redirect chain ending at internal IP. Detect redirect chains where an initial fetch to a public IP eventually redirects to an internal endpoint. Log and alert — legitimate content delivery does not redirect to RFC1918 addresses.

Behavioral anomaly detection:

Rule: Tool call sequence diverges from established session pattern. If an agentic session typically calls a defined set of tools in a consistent sequence, alert when a new tool call type appears immediately after a fetch_url result. This pattern — fetch external content, then call a privileged tool — is consistent with successful injection.
Rule: Tool call to privileged write/execute tools without prior user message. If a tool with destructive or write capability is called without a corresponding user instruction in the conversation turn that triggered it, flag for review. The model should not spontaneously decide to write files or execute code — that should trace back to explicit user intent.
Rule: Data exfiltration via chained fetch calls. Alert when fetch_url tool calls include URL query parameters or fragments that contain strings also present in earlier tool results or context. This pattern — encoding context data into a URL and fetching it — is a common exfiltration channel through fetch-capable MCP servers.

Mitigations Beyond Detection

Detection is necessary but not sufficient. The structural mitigations:

Label untrusted content explicitly in context. The model prompt architecture should clearly delineate tool results from external sources as untrusted. Wrapping external fetch results in a structured tag — <external_content source="untrusted"> — and training or prompting the model to treat such content as data, not instructions, reduces but does not eliminate injection risk.
Restrict fetch server access to a URL allowlist. For agentic deployments where the set of external resources is known, replace unrestricted fetch with a strict allowlist of domains. This eliminates the attacker-controlled fetch surface almost entirely.
Run MCP servers in network-isolated environments. A fetch server that cannot reach internal RFC1918 addresses cannot be weaponized for DNS rebinding. Network segmentation is the most reliable DNS rebinding mitigation.
Require per-call authorization for privileged tools. Tools with destructive or write capability should require an explicit human-in-the-loop confirmation rather than executing on model decision alone. This breaks the injection-to-execution chain even when injection succeeds.

The Provenance Problem Is Not Solved Yet

The root issue — that MCP tool results carry no cryptographically attestable provenance — is not patched by any of the above mitigations. They reduce the attack surface and increase detection probability, but they do not give the model a reliable way to distinguish trustworthy from untrusted content in its context window. That requires a protocol-level solution: a provenance field on tool results with a verifiable binding to the server's identity, analogous to TLS certificate verification at the transport layer.

Until such a mechanism exists and is widely deployed, the confused-deputy risk in MCP is structural. Defenders cannot patch it away — they can only reduce exposure through layered detection, strict network controls, and architectures that treat all external tool results as untrusted by default.

For the network-layer RCE class of MCP vulnerability and AEGIS fast-path detection, see Anthropic MCP RCE: 7,000 Servers Exposed and Why L1 Fast-Path Matters. For Sable's agentic security assessment offering, see our engagement options.