Why developers chose a solo-built AI interview tool over Google’s old option

When Google discontinued its Interview Warmup tool earlier this year, many developers lost a convenient way to practice interview responses and receive AI feedback. Rather than searching for an existing replacement, one entrepreneur decided to build a better version from scratch.

Within a year, 10xInterview.com launched as a solo-engineered platform that handles spoken answers, real-time feedback, and AI-powered resume analysis. Unlike most modern applications that rely on microservices and event-driven architectures, this tool runs on a deliberately minimal stack designed to stay maintainable for a single founder. Here’s how the architecture works—and why it might inspire your next solo project.

A deliberately simple stack built for solo maintenance

The engineering team behind 10xInterview prioritized simplicity over trendy complexity. Instead of adopting a sprawling microservices architecture, the platform uses a single Go binary for the backend, a React frontend with Vite, and a Postgres 17 database with the pgvector extension for vector embeddings.

AI inference relies on Google Cloud’s Vertex AI and Gemini API, while billing integrates seamlessly through Razorpay’s webhook-driven subscriptions. The entire system deploys using a single idempotent shell script, eliminating configuration drift and reducing operational overhead. There’s no Redis, no Kafka, and no second vector database—just a straightforward monolith that can be debugged at 3 AM by one person.

The core philosophy: if it isn’t necessary to add complexity, don’t add it.

How dual backend routing balances cost and quality

The biggest challenge in building an AI-powered tool is balancing affordability with performance. Free-tier users shouldn’t exhaust expensive APIs, while paid subscribers expect faster, higher-quality responses. The solution? A dual-backend AI router that abstracts model selection behind a clean interface.

Every AI capability—resume parsing, question reviewing, or live interview scoring—is implemented as an interface with two concrete implementations: one for free users and another for Pro subscribers. The backend uses a ReviewerRouter to dynamically select the correct implementation based on the user’s subscription plan, all without exposing the logic to individual handlers.

type Reviewer interface {
    Review(ctx context.Context, in ReviewInput) (ReviewOutput, error)
}

type ReviewerRouter struct {
    Free  Reviewer
    Paid  Reviewer
}

func (r *ReviewerRouter) Review(ctx context.Context, in ReviewInput) (ReviewOutput, error) {
    if auth.PlanFromContext(ctx) == auth.PlanPro {
        return r.Paid.Review(ctx, in)
    }
    return r.Free.Review(ctx, in)
}

This approach delivers three key advantages:

Graceful degradation: If the Pro API key fails, users automatically fall back to the free model without errors.
Zero-setup local development: Setting AGENT_ENABLED=false replaces all AI calls with deterministic stubs, letting new contributors run the app without Google Cloud credentials.
Easy tier expansion: Adding an Enterprise tier simply requires adding another field to the router and a single dispatch case—no handler changes needed.

The pattern has since been replicated across six different AI agents, proving its scalability within a monolithic codebase.

Streaming live feedback without adding infrastructure

The original version of the answer review feature relied on a synchronous REST call: upload audio, wait 8 to 14 seconds, then render the AI’s feedback. While functional, the user experience felt sluggish and unresponsive.

To fix this, the team implemented server-sent events (SSE) for token-by-token streaming. Instead of waiting for the entire response, users now see feedback as the AI generates it—reducing perceived latency by nearly half while keeping server costs flat.

The architecture uses an in-process pub/sub broker implemented in just 150 lines of Go. Each submission ID maintains a list of subscriber channels, allowing the backend to publish events directly to connected clients without external message queues.

type Broker struct {
    mu     sync.RWMutex
    subs   map[string][]chan Event
}

func (b *Broker) Publish(id string, e Event) {
    b.mu.RLock()
    for _, ch := range b.subs[id] {
        select {
        case ch <- e:
        default: // drop if subscriber is slow
        }
    }
    b.mu.RUnlock()
}

On the frontend, a simple EventSource connection handles the streaming:

const es = new EventSource(`/api/v1/submissions/${id}/stream`);
es.onmessage = (e) => {
    const event = JSON.parse(e.data);
    setReview(prev => mergeEvent(prev, event));
};

This pattern works because the entire application runs on a single Cloud Run instance during each user session. If future scaling demands multiple instances, the team plans to replace the broker with Redis pub/sub or implement sticky sessions—but for now, the minimal approach suffices.

Why Postgres + pgvector beats dedicated vector databases

Two core features in 10xInterview rely on vector similarity: question deduplication and resume-aware question recommendations. Instead of spinning up a separate vector database, the platform stores embeddings directly in Postgres using the pgvector extension.

Question deduplication prevents duplicate entries in the catalog when administrators or the AI generate similar questions in different phrasings. Resume-aware recommendations improve interview personalization by matching questions to the candidate’s background.

By centralizing both operational and vector data in a single database, the team avoids the complexity of managing multiple systems. This choice also simplifies backups, migrations, and query optimization—critical factors for a solo-maintained application.

As the platform evolves, the team may introduce caching or hybrid search strategies, but for now, Postgres handles everything efficiently within a predictable operational budget.

Looking ahead: building for maintainability first

The success of 10xInterview proves that modern AI applications don’t require sprawling architectures to deliver great user experiences. By focusing on simplicity, intentional trade-offs, and maintainable code patterns, a solo founder can ship a production-grade platform without burning out.

The next phase includes adding session pinning for horizontal scaling and exploring real-time collaboration features. But the foundation remains unchanged: a boring stack that works, a router that scales, and a streaming experience that feels instantaneous. For solo developers considering AI projects, the lessons here might be the most valuable takeaway of all.

AI summary

Google Interview Warmup kapanınca ne kullanmalı? Go, React ve Postgres ile geliştirilen 10xInterview’in mimari sırları ve AI tabanlı mülakat teknikleri hakkında her şey.

Why developers chose a solo-built AI interview tool over Google’s old option

A deliberately simple stack built for solo maintenance

How dual backend routing balances cost and quality

Streaming live feedback without adding infrastructure

Why Postgres + pgvector beats dedicated vector databases

Looking ahead: building for maintainability first

Comments

GitHub Copilot CLI challenges AI coding agents with native Git integration

Why HTML as a markup language is changing developer workflows

Why Large Language Models Struggle with Database Design and Innovation