Static analysis of three open-source AI agent codebases uncovered a troubling pattern: 83% of tool calls capable of performing side effects—such as writing to databases, deleting files, or charging cards—lacked any form of security guardrails. The finding emerged from a systematic inventory of executable functions and their associated controls, not from penetration testing or vulnerability hunting.
The study examined three TypeScript-based AI agent projects—OpenClaw, Mastra, and OpenAI Agents JS—using a custom static analyzer named diplomat-agent-ts. Built on the TypeScript compiler API (ts-morph), the tool scanned 11,379 files across four scopes, identifying 669 tool calls that could trigger side effects. Of these, 553 had no visible security controls in the code, including input validation, authentication checks, rate limiting, or confirmation steps. Only 116 calls had partial controls, while none were explicitly confirmed as secure within the scanner’s framework.
Why unguarded tool calls pose unique risks in AI agents
In traditional web applications, user actions are mediated by multiple layers of security. A button click passes through forms, validation, session checks, and middleware before reaching a database write or payment processing function. These controls are deliberately designed by developers to enforce business rules and prevent misuse.
AI agents, however, operate differently. An LLM autonomously selects which function to call, determines its arguments, and may execute it repeatedly without human oversight. Unlike human-driven applications, agents lack inherent awareness of business logic or security policies. This makes them vulnerable to unintended consequences, such as looping calls, hallucinated arguments, or manipulation through injected tool results.
As a result, security controls in agent systems cannot rely on user interfaces or middleware alone. They must be embedded directly in the code at the point where the model interacts with sensitive functions. The critical question shifts from Is this application secure? to For every function the model can execute that performs a real-world action, is there a visible control in the code—and if not, is this intentional?
Most development teams currently lack such an inventory. Without visibility into every callable function and its associated protections, it becomes impossible to review or audit security postures effectively.
How the analysis was conducted
The researcher developed diplomat-agent-ts, a lightweight static analyzer designed to scan TypeScript codebases for unguarded tool calls. The tool leverages ts-morph to parse the abstract syntax tree (AST) of each project, identifying call expressions that match predefined patterns associated with side effects. These patterns span 12 categories, including payments, database operations, file deletions, HTTP writes, agent invocations, and dynamic code execution.
A tool call was defined as any function invocation matching one of 40+ patterns across these categories. A guard was identified as an in-file control visible to the scanner, such as input validation libraries (e.g., Zod or Yup), rate-limiting decorators, authentication checks, confirmation steps, idempotency keys, or retry bounds.
Each identified call was categorized into one of three states:
no_checks— No security controls present in the same function as the call.partial_checks— Some controls exist but fail to cover all expected risks.confirmed— Explicitly marked as secure using a// checked:okannotation (a convention introduced by the scanner).
Importantly, the confirmed state is a scanner-specific convention. None of the three open-source projects used this annotation, so the zero confirmed count reflects the scanner’s design rather than a security judgment against the projects.
The scans were performed on unmodified public repositories at pinned commit hashes to ensure reproducibility. All commands and processes used for the analysis are documented in each repository’s MANIFEST.md file.
Key findings and breakdown by category
Across the four scanned scopes—OpenClaw application, Mastra framework, and OpenAI Agents JS framework and examples—the total pool of 669 tool calls revealed a consistent pattern: 83% lacked visible security controls.
| Codebase (scope) | Type | Files | Tool calls | no_checks | partial | |-------------------------|-------------|----------|------------|-----------|---------| | OpenClaw (src/) | Application | 7,874 | 419 | 332 (79%) | 87 | | Mastra (packages/) | Framework | 2,777 | 185 | 162 (88%) | 23 | | OpenAI Agents JS (packages/) | Framework | 426 | 33 | 31 (94%) | 2 | | OpenAI Agents JS (examples/) | Examples | 302 | 32 | 28 (88%) | 4 | | Total | | 11,379 | 669 | 553 (83%) | 116 |
The data suggests that even well-designed frameworks, such as OpenAI’s own, exhibit high rates of unguarded tool calls. This does not indicate poor engineering—rather, it reflects that many security controls exist outside the code at the framework or runtime level, invisible to static analysis.
Breaking down the 669 tool calls by side-effect category reveals where risks are concentrated:
destructive(subprocess/shell execution): 486 occurrencesfile_delete: 214 occurrencespublish: 124 occurrencesagent_invocation: 120 occurrenceshttp_write: 86 occurrencesllm_call: 3 occurrencesdatabase_delete: 3 occurrencesdynamic_code: 1 occurrence
The distribution reflects the nature of each codebase. OpenClaw, designed as a command-and-control tool, shows a high concentration of destructive and file_delete calls, as these are core to its functionality. Framework codebases, by contrast, lean toward publish and agent_invocation, reflecting their role in orchestrating agent interactions and artifact distribution.
One uncomfortable observation stands out: the destructive category—the largest by far—often represents intentional functionality. A shell runner should be able to execute commands. The scanner’s role is not to flag every such call as a vulnerability but to provide an inventory that teams can triage based on context and risk.
To support risk assessment, findings were tagged using OWASP’s Agent Security Initiative (ASI) codes:
ASI-02(tool misuse): Applies to all 669 calls.ASI-01(excessive agency): Applies to 576 calls where side effects lack authentication checks.ASI-03(privilege compromise): Applies to 465 calls involving high-stakes operations without confirmation steps.
These tags provide a structured way to prioritize remediation efforts across large codebases.
What this means for AI agent security
The findings underscore a fundamental shift in security paradigms for AI-driven systems. Traditional application security relies on layered controls enforced at runtime or in middleware. Agents, however, require security to be embedded directly in the code at the point of execution, where the model interacts with the environment.
This analysis is not a vulnerability assessment. It is an inventory—a map of where to look. Teams cannot secure what they cannot see. An 83% rate of unguarded tool calls does not mean 83% of systems are insecure; it means most systems lack a complete picture of their attack surface.
Moving forward, teams building AI agents should prioritize creating and maintaining an up-to-date inventory of all tool calls, their capabilities, and their associated controls. This includes documenting runtime protections that may not appear in static analysis, such as API gateways, authentication middleware, or deployment-time configurations.
Ultimately, the goal is not to eliminate unguarded calls but to ensure every one is intentionally designed, visibly controlled, and auditable. As AI agents become more autonomous and integrated into critical workflows, the stakes for this discipline will only continue to rise.
AI summary
Üç açık kaynaklı yapay zekâ ajanı kod tabanını tarayan araştırmacı, fonksiyonların %83'ünde herhangi bir koruma olmadığını ortaya koydu. Veritabanı işlemlerinden ödeme çağrışımlarına kadar geniş bir yelpazede yer alan bu riskler, ajan tabanlı sistemlerde yeni güvenlik endişeleri yaratıyor.