How an 8-Month AI Exam App Journey Solved Study Inefficiency

Eight months ago, a university computer science exam exposed a glaring gap in traditional study methods. The assignment demanded pseudocode generation—something I already practiced daily—yet I was forced to memorize patterns instead of applying skills. Frustrated by the disconnect, I took matters into my own hands and built ExamIntelligence, a study app that uses AI to transform how students prepare for exams.

The Core Problem: Exams Reward Patterns, Not Understanding

Traditional exam systems prioritize pattern recognition over deep comprehension. Students spend hours memorizing question formats and mark schemes, only to regurgitate information without grasping underlying concepts. This approach stifles curiosity and turns learning into a mechanical exercise.

I experienced this firsthand with organic chemistry—a subject I initially dreaded. Everything changed when I discovered graph neural networks (GNNs) through Machine Learning with PyTorch and Scikit-Learn. The material suddenly made sense because the learning method aligned with how I naturally process information. That revelation became the foundation for ExamIntelligence: an AI tool designed to bypass inefficient study habits by focusing on what truly matters.

From Vibe-Coded Prototype to Production-Ready Code

With exams looming, I needed a solution fast. My first attempt relied on rapid prototyping with an AI assistant—what I call "vibe-coding"—where I let the AI generate the initial structure with minimal oversight. The process was chaotic but yielded a working proof of concept within days.

The initial tech stack included:

Gemini API for natural language processing without frameworks
Streamlit for a quick frontend interface
PostgreSQL for data storage

When I finally reviewed the auto-generated code after exams, the results were underwhelming:

A dashboard displaying irrelevant statistics
Incomplete JSON responses from the AI, saving only partial data
Randomly created database tables that failed to match subject requirements

The experience confirmed what I suspected: unsupervised AI development leads to unmaintainable code. I scrapped the prototype and rebuilt the entire system from scratch using Django for the backend and LangGraph for structured workflows. This shift eliminated spaghetti code and created a foundation I could actually maintain.

"Letting an AI write your core architecture without human oversight is a fast track to technical debt. Real systems demand real foundations."

A Balanced Approach to AI Coding Assistance

After the initial failure, I adopted a hybrid strategy for AI-assisted development. Instead of letting agents generate entire modules, I focus on building the core architecture myself—writing clean, intentional code in Neovim—then using AI for targeted refinements.

My workflow follows this pattern:

I implement the foundational components (e.g., user authentication) with proper structure
I hand the code to an AI assistant with specific instructions for modifications
The AI refines the existing code rather than generating new solutions from scratch

This method ensures:

Consistent code quality through human-designed patterns
Reviewable changes via clear diffs
Maintainable systems that don’t rely solely on AI-generated logic

I use different tools for various tasks:

Claude Web for precise local edits
OpenCode for cloud-based model interactions
Pi coding agents for local model experimentation

Critically, I achieved this without spending a dime on AI services. By focusing on targeted edits and minimizing unnecessary token usage, I maintained full control over development costs while building a robust application.

Navigating the AI Experimentation Minefield

Once the core system stabilized, I dove deep into optimizing the AI components. This phase proved to be the most challenging, with numerous dead ends before finding a working solution.

My first attempt involved switching from Gemini 2.5 Flash to local Llama models for processing exam papers. While the text extraction worked well with simple PDFs, complex documents with diagrams and tables broke the pipeline. I spent a week benchmarking Python libraries like PyMuPDF and pdfplumber, only to realize they couldn’t handle structural elements.

Undeterred, I pivoted to small multimodal models, specifically Llama-3.2-3B-Instruct, using a sliding context window approach. I passed page-by-page images alongside cumulative JSON state from previous pages, mimicking how graph-based models process information. The results were initially promising until I discovered a critical flaw in my approach: the system was attempting to replicate a Global Interpreter Lock (GIL) strategy, which proved incompatible with the task.

The experimentation phase consumed significant resources:

Burned through a Lightning AI GPU budget on inference tests
Undertook Unsloth fine-tuning loops that took weeks longer than cloud endpoints
Accidentally trained models on unfiltered, low-quality v1 outputs, resulting in garbage-in-garbage-out scenarios

After prelims 2, I reset entirely, focusing on a clean rebuild that prioritized production readiness over experimental features.

The Hybrid Architecture That Made the Difference

The breakthrough came with a hybrid AI system that combined the strengths of different models for specific tasks. After months of trial and error, I settled on:

Gemini 3.1 Flash-Lite for heavy multimodal processing, including PDF parsing and image analysis
Qwen 3.6 35B for routing and text-only processing, ensuring consistent output formatting

This architecture delivered:

Production-grade output quality
Reasonable processing speeds
Reliable JSON formatting for structured data

I initially explored other approaches, including speculative decoding with paired Qwen 3.5 models and auto-research setups using local Gemma 4 and Qwen 3.5 variants. However, these methods either introduced unacceptable latency or failed to meet quality standards.

The app launched to production on June 5th, offering students an alternative to traditional exam preparation that respects both their time and their understanding of the material.

Looking Ahead: AI That Adapts to Students, Not Exams

The 8-month journey from frustration to functional tool reinforced a fundamental truth about education technology: the best systems don’t just automate studying—they reimagine it. ExamIntelligence represents the first step toward an adaptive learning platform that grows with each student’s needs rather than forcing them into rigid patterns.

Future iterations will focus on expanding subject coverage, improving AI explanation capabilities, and incorporating student feedback to refine the learning experience. The goal isn’t just to help students pass exams—it’s to help them develop genuine mastery.

For IGCSE students tired of uninspired memorization, the early access waitlist is now open, with the first 500 signups receiving priority access.

AI summary

Discover how an engineering student built an AI-powered exam tool in eight months to replace rote memorization with adaptive learning that saves time while improving comprehension.

How an 8-Month AI Exam App Journey Solved Study Inefficiency

The Core Problem: Exams Reward Patterns, Not Understanding

From Vibe-Coded Prototype to Production-Ready Code

A Balanced Approach to AI Coding Assistance

Navigating the AI Experimentation Minefield

The Hybrid Architecture That Made the Difference

Looking Ahead: AI That Adapts to Students, Not Exams

Comments

How idempotency keys prevent duplicate social media posts in automation

Master Kafka Partitioning to Avoid Costly Production Failures

V.E.L.O.C.I.T.Y.-OS Reaches Full Autonomy with Self-Evolving Kernel