AI agents are evolving beyond chatbots into systems that can autonomously interact with tools, access sensitive data, and trigger workflows. This shift unlocks powerful capabilities but also introduces significant security challenges. How can teams assess the risks before deploying an AI agent into production?
To address this gap, I built AgentGuardian—a local-first security scanner designed to evaluate agentic AI workflows for risks such as prompt injection, tool misuse, excessive autonomy, sensitive data exposure, insecure output handling, and insufficient human oversight.
The tool is built with Python, Streamlit, and Pandas, and runs entirely locally using Ollama with a local LLM. No external API keys are required, making it ideal for controlled environments where sensitive data must not leave the premises.
The core challenge: securing autonomous AI agents
AI agents are increasingly connected to critical systems like email clients, file storage, databases, CRMs, ticketing platforms, calendars, payment gateways, and web browsers. While these integrations enhance functionality, they also expand the attack surface.
Consider two contrasting scenarios:
- An agent that only summarizes public documents carries minimal risk.
- An agent that reads customer complaints, accesses order history, drafts refund responses, and sends emails handles sensitive data and performs business-impacting actions.
The latter requires rigorous security review before deployment. AgentGuardian helps teams answer key questions:
- Which tools does the agent access?
- What types of data does it process?
- Does it receive untrusted external inputs?
- Can it trigger actions automatically?
- Is human approval required?
- What risks must be mitigated before deployment?
How AgentGuardian works: a dual-layer architecture
AgentGuardian operates with two interconnected layers: a deterministic rule-based engine and a local LLM for security summarization.
1\. Deterministic risk scoring engine
The scoring engine uses clear, predefined conditions to assign risk points—ensuring consistent and explainable results. Unlike LLM-based scoring, which can vary, this approach provides reliable assessments.
For instance:
- Receiving emails, uploaded files, or web inputs increases prompt injection risk.
- Accessing email systems, databases, payment gateways, or executing code raises tool misuse risk.
- Handling financial, health, customer, or student data elevates sensitive data exposure risk.
- Automatic action execution without constraints signals excessive autonomy risk.
- Absence of human approval requirements indicates insufficient oversight.
Each condition maps directly to a risk category, enabling transparent scoring from 0 to 100.
2\. Local LLM security summary
After the engine computes the score, a local LLM—powered by Ollama—generates a human-readable security analysis. The model does not influence the score but explains its rationale, making the output actionable for developers, security teams, and stakeholders.
This two-tier design preserves local-first principles while delivering clear, insightful reports.
Why local-first matters for AI security
Many AI agent workflows involve proprietary business logic, internal datasets, or regulated data. Relying on external LLM APIs could expose sensitive information or introduce latency and dependency risks.
AgentGuardian avoids these issues by running entirely on local infrastructure. Teams can:
- Pull lightweight models like
llama3.2orllama3.1:8busing Ollama. - Process agent workflows without sending data to third-party servers.
- Deploy the tool in air-gapped or restricted environments.
This approach aligns with the principle of zero-trust security and supports compliance with data residency requirements.
Building a practical security interface with Streamlit
The web app uses Streamlit to offer an intuitive, three-tab interface:
- Agent Workflow Scanner: Collects agent details including name, purpose, tools, data types, external inputs, autonomy level, and approval workflows.
- Risk Knowledge Base: Explains common threats like prompt injection, tool misuse, sensitive data exposure, excessive autonomy, and insecure output handling.
- Sample Scenarios: Provides preconfigured workflows for testing, such as invoice processing, customer support, or document summarization.
A key usability feature is form validation—AgentGuardian prevents report generation when required fields are empty, transforming it from a prototype into a reliable security tool.
Real-world example: the high-risk invoice payment agent
One sample scenario demonstrates AgentGuardian’s capability: an invoice payment agent that automatically approves payments under $5,000 after verifying vendor records. This workflow involves:
- Accessing emails, files, databases, and payment systems.
- Processing financial data, customer records, and potentially credentials.
- Receiving emails, uploaded invoices, and API responses.
- Executing payments without human intervention.
The tool identifies multiple high-risk factors:
- Prompt injection via malicious emails or invoice attachments.
- Sensitive data exposure from financial and customer records.
- Potential tool misuse through payment system access.
- Excessive autonomy due to automatic execution.
- Lack of human oversight.
AgentGuardian assigns a high or critical risk score and recommends mitigations such as requiring human approval, enforcing least-privilege access, enabling logging and input validation, and reviewing outputs before action.
The future of agentic AI security
As AI agents grow more autonomous and deeply integrated into business processes, security assessment tools must evolve in lockstep. AgentGuardian offers a practical, local-first solution that balances usability, transparency, and control—without relying on external APIs or cloud services.
By enabling teams to systematically evaluate agent workflows before deployment, the tool helps reduce exposure to prompt attacks, data leaks, and unchecked actions. While still in prototype form, its architecture and design principles point toward a more secure future for agentic AI systems.
AI summary
Discover AgentGuardian, an open-source tool that locally scans AI agent workflows for prompt injection, data leaks, and autonomy risks—no cloud APIs required.