AI Code Security: Why 63% of AI-Generated Functions Miss Critical Vulnerabilities

AI-powered code generation has transformed software development, but security remains an afterthought in many implementations. A recent analysis comparing two leading AI models—Gemini 2.5 Flash and Claude Sonnet 4.6—revealed a sobering statistic: 63% of AI-generated functions shipped with vulnerabilities that standard security tools easily detect. The implications for teams relying on AI-assisted development are clear: defaulting to AI-generated code without rigorous review introduces systemic risk.

The Methodology Behind the Comparison

This evaluation tested both models across four critical security domains using specialized static analysis plugins. The prompts provided only functional requirements—no explicit security directives—mirroring how developers typically use AI assistants. Each model generated equivalent functionality once via the Gemini CLI and once via the Claude CLI, then underwent linting with security-focused ESLint plugins mapped to specific Common Weakness Enumerations (CWEs).

The four domains tested were:

NestJS service implementation with authentication and admin features
JWT authentication middleware for login and verification
MongoDB data layer using Mongoose models
General API functions vulnerable to injection attacks

Each test produced results that reveal more about AI-assisted development patterns than which model performs better.

Domain-by-Domain Results: Where AI Falls Short

The security gaps emerged consistently across both models, with each making identical critical omissions in different contexts.

NestJS Services: The Power of Framework Awareness

In the NestJS test, Gemini’s output demonstrated superior integration with the framework’s security conventions. By default, it implemented class-level guards (@UseGuards), password field exclusion (@Exclude), and Data Transfer Object (DTO) validation (class-validator). The nestjs-security plugin detected only two issues in this implementation.

Claude’s functionally equivalent code contained six security violations, including missing guards, inadequate password protection, and absent validation. The stark difference highlights how framework-specific knowledge impacts security outcomes when using AI assistants.

JWT Authentication: The Blind Spot in Token Validation

Both models successfully avoided common JWT pitfalls like using alg: none, hardcoded secrets, or insecure decoding methods. However, they stopped at the same critical juncture—failing to implement essential RFC 8725 validation requirements:

Algorithm whitelisting: Both omitted proper algorithm restrictions, leaving systems vulnerable to signature confusion attacks
Audience validation: Neither model included audience claims validation, allowing tokens issued for one service to be accepted by another
Issuer validation: Critical issuer checks were missing, enabling token forgery opportunities
Payload sensitivity: Both returned sensitive payload data without proper redaction

The jwt plugin flagged five identical issues in both implementations, revealing a systemic blind spot in AI-generated authentication code that human reviewers often miss.

MongoDB Data Layers: The Password Leak Crisis

Perhaps the most alarming finding involved password handling in MongoDB queries. Both models returned entire documents—including password hashes—without applying projection to exclude sensitive fields. The typical implementation looked like this:

const results = await User.find(filter);

This unknowingly transmits password hashes to clients, violating data exposure principles. The correct approach—applying projection to exclude sensitive fields—was absent in both AI-generated solutions:

const results = await User.find(filter).select('-passwordHash').lean();

Surprisingly, both models avoided injection vulnerabilities despite receiving untrusted input in search parameters. The MongoDB security plugin detected eight violations in each implementation, primarily related to sensitive data exposure rather than injection risks.

General API Functions: Injection Protection, But Not Enough

In the general API test, both models demonstrated awareness of injection risks by avoiding direct operator interpolation in queries. However, the secure-coding plugin identified 13 violations in Claude’s implementation versus nine in Gemini’s, with the disparity driven by inconsistent input validation and sanitization practices.

The findings suggest that while AI assistants are learning to avoid obvious injection vectors, they still struggle with comprehensive security hardening across all code paths.

The Bigger Picture: AI Security Gaps Are Systemic

The 63% vulnerability rate across 700 AI-generated functions should concern every development team. The identical failure patterns across both models indicate a fundamental issue with how AI assistants are currently trained and prompted.

Security experts warn that relying solely on AI-generated code without additional review cycles creates a false sense of security. The missing hardening steps—audience validation in JWT tokens, sensitive field projection in database queries, and framework-specific security patterns—represent the new frontier of vulnerabilities in AI-assisted development.

Recommendations for Secure AI-Assisted Development

Teams using AI code assistants should implement these practices:

Always lint AI-generated code with security-focused static analysis tools
Supplement AI outputs with manual review focused on security-critical components
Implement security-specific prompts that explicitly require common hardening steps
Establish baseline security patterns for your technology stack to guide AI generation
Invest in security-focused training for developers using AI assistants

The future of AI-assisted development depends on addressing these systemic security gaps. Until AI assistants consistently produce secure code by default, development teams must treat their outputs as untrusted code that requires rigorous validation.

AI summary

Yapay zeka destekli kod üreten modellerin güvenlik açıklarını araştıran yeni çalışma, %63 oranında zafiyet tespit etti. Hem Claude hem de Gemini aynı hatalara sahipti — bu durum AI güvenliğinin ne kadar kritik olduğunu gösteriyor.