How to catch hidden MCP server flaws scanners miss at runtime

A few weeks ago, I launched Warden, a governance layer designed to sit in front of an MCP server and enforce role-based, field-level access controls. The demo highlighted a support role that could list customer accounts but never see their billing tier. The system redacted the tier field from all responses, so everything appeared secure on the surface.

But when I tested it by querying for Enterprise accounts, the results revealed an oversight. The support role couldn’t see the tier in the output, but the query layer still accepted it as a filter. When I asked for every Enterprise account, the matching rows revealed their tiers simply by existing in the results. The redaction only applied to the output, not the input—meaning sensitive data leaked through the query itself.

This wasn’t a flaw in the governance rules or the server’s configuration. It was a subtle runtime authorization gap, one that static security scanners couldn’t catch. These tools examine the server’s manifest, checking tool descriptions and metadata for suspicious patterns. But the query_resource tool’s description was entirely honest, and the bug only surfaced when a real role executed a real query. A scanner that relies on static text analysis simply couldn’t reach this vulnerability.

The birth of a runtime scanner

This realization led me to build Siege, a tool designed to test MCP servers not by reading their text, but by actively probing them as if it were an attacker. Instead of grepping manifests, Siege connects to a live server, impersonates different roles, and compares what each role receives against a baseline. It doesn’t just look for poisoned instructions—it actively tests whether access controls hold up under real-world conditions.

The core principle behind Siege is runtime authorization testing. Static scanners excel at catching tool-poisoning attacks, but they miss the nuances of how a server behaves when real roles send real queries. RBAC vendors often advise teams to "red-team your authorization scope," but few provide the tools to do it systematically. Siege turns that advice into actionable testing.

An early design rule shaped the tool’s development: no hardcoded field names, no predefined roles. If Siege only caught the Warden bug because I told it to look for tier, it would be nothing more than a unit test. Instead, the scanner learns the schema and real values from the most permissive role—the one that sees everything—and then compares every restricted role’s output against that baseline. The differences reveal gaps in access control.

Four critical detectors emerge

From this differential approach, four runtime authorization detectors emerged, each tailored to specific classes of vulnerabilities:

Redacted-field filter leak. Generalized from the Warden bug, this detector checks whether a field stripped from a role’s output can still be used as a filter. If applying a filter on a redacted field returns fewer rows than the baseline, it means the hidden value leaked through the discrepancy. The support role couldn’t see tier in responses, but querying for tier: Enterprise still returned matching rows—exposing the underlying data.

Row-scope escalation. Some roles have their view restricted to a subset of data, such as a specific region. This detector tests whether a role can bypass that restriction by applying an out-of-scope filter. If querying region=East returns rows it shouldn’t have access to, the filter was applied against the full dataset, not the scoped one.

ID enumeration. Governance often applies scoping to list operations but overlooks single-record lookups. A role restricted from listing accounts might still retrieve specific records using get_record with guessed IDs, effectively walking past the scoping rules enforced by query_resource. This is classic Insecure Direct Object Reference (IDOR) in an MCP context.

Forbidden-resource read. This detector flags cases where a role can’t even list a resource, yet can still retrieve it via get_record. The access control is checked on list and query endpoints but forgotten on the by-ID path, allowing unauthorized data access.

These detectors revealed vulnerabilities I never found by hand. Building the engine to catch one bug uncovered the next few almost automatically. The scanner’s differential approach turned a manual debugging process into a systematic, repeatable test suite.

Proving the fix works

To validate Siege, I maintain two versions of Warden: the vulnerable commit (4938bdf) and the fixed one (7188eed). Running Siege against both versions clearly demonstrates its effectiveness.

Before (vulnerable Warden):

[HIGH] Redacted field 'tier' leaks through filter predicate on 'accounts'
Found as role: support
Reproduce: query_resource({"resource_type":"accounts","filters":{"tier":"Enterprise"}})
baseline_count: 8
filtered_count: 6
leaked_records: ['Acme Corp', 'Initech', 'Umbrella Co', 'Hooli', 'Stark Industries', 'Wayne Enterprises']

After (fixed Warden):

No findings. The probed classes held.
VERDICT: PASS

Every finding includes an exact, replayable reproduction step: the tool, the arguments, and the rows returned. You can paste the query into your own client and witness the leak yourself. To ensure the detectors aren’t just passing everything by default, Siege also runs against an intentionally broken fixture server in the repository. It exercises all four detectors, including the critical forbidden-resource read, so you can see them in action.

Testing agent hijacking risks

Static scanners often focus on tool-poisoning attacks—hiding malicious instructions in tool descriptions or outputs to trick an agent into performing unintended actions. Siege takes a different approach by testing whether an agent actually gets hijacked in practice.

The scanner simulates a real agent loop with a benign read tool and an export_record sink that sends data to a controlled URL. The user’s task is simple: summarize record 1. Siege then injects payloads through multiple channels—tool descriptions, output, system prompts—and monitors whether the model triggers the sink at an unauthorized destination. A hijack is observed in real time, not inferred from text analysis.

The results are presented as a matrix showing which payloads succeeded across different channels. Five payloads are tested across two channels: system-block spoofing, plain policy text, role confusion, and task decomposition. A clean 0-of-5 means the agent resisted all attempts, while any success reveals a critical vulnerability. This matrix also serves as a regression guard—if a framing that previously bounced starts succeeding after a model upgrade, Siege will catch it.

What Siege doesn’t (yet) cover

Siege’s current scope is focused on MCP servers, with no support for OpenAI function-calling—though that’s planned for future expansion. The tool operates over stdio transport today, but HTTP support is next on the roadmap. The report also clearly lists the classes of tests it ran and those it skipped, ensuring transparency in its methodology.

Runtime authorization testing fills a critical gap in MCP server security. Static scanners have their place, but they can’t replicate the complexity of real user queries, roles, and edge cases. Siege bridges that gap by actively probing servers as attackers would, revealing flaws that only emerge in production. As MCP adoption grows, tools like this will become essential for teams serious about securing their authorization layers.

AI summary

MCP sunucularınızın çalışma anındaki yetki zafiyetlerini otomatik tespit eden Siege aracını tanıyın. Veri sızıntılarını ve rol tabanlı erişim sorunlarını nasıl yakaladığını keşfedin.

How to catch hidden MCP server flaws scanners miss at runtime

The birth of a runtime scanner

Four critical detectors emerge

Proving the fix works

Testing agent hijacking risks

What Siege doesn’t (yet) cover

Comments

Why your messy codebase makes AI tools stumble

How to Eliminate Static AWS Keys for Safer Cloud Deployments

Why 'Free' Local AI Executors Can Cost More Than Cloud Models