How AI catches Playwright test drift before it fails in production

Automated test suites are the backbone of modern software reliability. Yet there’s a hidden crisis unfolding in many codebases: tests that pass but no longer validate what they once did. This phenomenon, known as test drift, quietly erodes quality assurance while green checkmarks mask the decay.

The invisible threat: passing tests with failing intent

When Playwright test suites mature, maintenance often takes a backseat to new features. Over time, assertions are trimmed, requirements evolve, and edge cases are overlooked—yet the test suite remains green. The damage isn’t visible until a critical user flow breaks in production.

Common patterns of test drift include:

A test originally verifying three user journeys now validates only one.
Jira requirements change, but no one updates the corresponding tests.
Quick fixes remove assertions that were once essential.
Historical coverage data disappears into git history without review.

The most alarming aspect? No existing tool alerts teams when a test has stopped doing what it was designed to do.

Why generic AI tools miss the bigger picture

Popular code assistants like GitHub Copilot and Claude excel at writing and reviewing code. They can suggest improvements, optimize syntax, and even debug logic—within the current file or pull request.

However, they lack three critical capabilities:

Understanding evolving product requirements from systems like Jira.
Tracking how test suites change over months or years.
Detecting when a test’s intent no longer matches its implementation.

These tools operate in the present moment, without context across time or across organizational systems. They cannot answer the most important question: Is this test still doing its job?

Building a requirement-aware test drift analyzer

To address this gap, a new system was developed that bridges the gap between product intent and test implementation. The solution combines multiple data sources to perform semantic analysis across time and context.

The architecture integrates four key inputs:

Jira requirements and their evolution over time.
Playwright test code and its assertions.
Git history showing how tests have changed.
Retrieval Augmented Generation (RAG) for repository-level understanding.

By analyzing these layers, the system can determine whether a test is still aligned with its original purpose—or if it has silently drifted off course.

How semantic drift detection works in practice

The process unfolds in five coordinated steps:

1. Extract requirement intent from Jira

The system parses Jira tickets to identify:

Expected user behaviors.
Validation scenarios and edge cases.
Acceptance criteria for each feature.

This creates a living document of what the product should do, not just what it currently does.

2. Extract test intent from Playwright code

From each test file, the system identifies:

The primary user flow being validated.
All assertions and their targets.
The expected outcomes and error conditions.

This builds a semantic map of what the test actually checks.

3. Analyze git history for change patterns

Every commit tells a story. The system examines:

When assertions were added or removed.
Which requirements were updated but not reflected in tests.
How coverage has degraded over time.

This historical lens reveals patterns that static analysis cannot.

4. Apply RAG for repository context

Using vector search with embeddings, the system retrieves:

Related test files that implement similar validations.
Historical versions of the same test.
Alternative implementations for missing coverage.

This ensures decisions are informed by the entire codebase, not just the current file.

5. Detect drift and quantify coverage loss

The final step compares intent across all layers:

Requirement intent vs current test intent.
Past coverage vs current coverage.
Implementation drift vs product evolution.

When misalignment is detected, the system generates actionable reports.

Real-world output: actionable insights, not noise

Instead of vague warnings, the system produces precise feedback:

⚠️ Drift Detected Test: login.spec.ts Missing: Error message validation for invalid login History: Assertion for dashboard visibility removed 2 commits ago Coverage: 65% Suggestion: Refer to auth/error.spec.ts for correct implementation

This level of specificity transforms drift detection from a theoretical problem into a practical fix.

The power of intelligent suggestions

One of the most valuable features is the ability to recommend existing implementations when coverage is missing.

Rather than simply stating:

“This validation is missing.”

The system says:

“This validation is missing, but auth/error.spec.ts already implements it correctly. Apply this pattern here.”

This promotes standardization, reduces duplication, and accelerates fixes—all while preserving repository knowledge.

The technology behind the solution

The system leverages a modern AI stack:

Python for orchestration and logic.
LangChain for pipeline management.
ChromaDB for vector storage and retrieval.
sentence-transformers and Ollama for local embeddings.
Claude API for reasoning and analysis.
Jira API for requirement integration.
Git history analysis for temporal context.

This combination ensures scalability, accuracy, and maintainability in production environments.

What sets this approach apart

This is not another AI linter or code formatter. It uniquely addresses three previously unsolved challenges:

Aligns tests with product requirements across the entire lifecycle.
Detects semantic drift over time using historical context.
Uses git history as a first-class citizen in quality assurance.

Most AI tools solve for the present. This system solves for the future by ensuring tests remain trustworthy as products evolve.

Future directions: beyond detection to prevention

Several enhancements are under exploration:

Runtime validation: Running Playwright tests to confirm behavior matches intent.
Auto-suggesting fixes: Proposing safe assertions based on similar tests.
Improving retrieval accuracy: Enhancing vector search to reduce false positives.
Framework support: Expanding beyond Playwright to other testing tools.

The goal is to move from reactive drift detection to proactive test quality assurance.

A call to rethink test integrity

In software development, we invest heavily in writing tests. But we rarely ask the most important question: Are they still doing their job?

This project represents a step toward answering that question with confidence. It’s time to stop trusting green checkmarks and start verifying intent.

How do you currently detect when your tests stop testing what they should? What systems do you use to keep test suites aligned with evolving requirements?

The future of reliable automation depends on tools that don’t just help us write tests—but help us write the right tests, every time.

AI summary

Playwright testleriniz geçiyor ama gerçekten doğru şeyleri mi test ediyor? Test kayması sorununa akıllı RAG tabanlı bir yaklaşımla nasıl çözüm bulabilirsiniz? Ayrıntılı inceleme ve uygulama önerileri.