When preparing to launch Autentico 2.0, a self-contained OAuth 2.0 and OpenID Connect identity provider built with Go and SQLite, the team expected smooth sailing. The feature set was complete, tests passed, and documentation was up to date. Yet, a week of rigorous stress testing exposed unexpected flaws in their assumptions about performance under load. What began as a routine benchmarking exercise evolved into a deep dive into concurrency, profiling, and the subtle ways database design shapes user experience.
The Unexpected Performance Trap
The first round of stress tests used k6 to simulate 100 concurrent users executing a full PKCE authorization code flow. Each iteration included five HTTP requests, four to five SQLite writes, and a bcrypt password verification. On an older i5 laptop, the results were acceptable but far from impressive. Profiling revealed the culprit: bcrypt.CompareHashAndPassword consumed 90% of CPU time. This function, designed to be intentionally slow to protect passwords, became the single point of failure. SQLite writes took mere microseconds, JWT signing was negligible, and HTTP routing vanished into the background—just bcrypt, saturating every available core.
The team initially assumed that bcrypt’s slowness was unavoidable. After all, the point of bcrypt is to be computationally expensive. But the realization that one slow function could paralyze an entire system sparked a radical question: could they isolate and scale just that bottleneck?
Rethinking the Architecture for a Narrow Problem
The search for a solution led to several dead ends. CQRS with SQLite replication via LiteFS promised distributed reads and writes, but it solved a general scaling problem when the actual issue was specific to bcrypt. Migrating to PostgreSQL was considered, yet even with multiple application instances behind a load balancer, the bcrypt bottleneck would persist on each server. Spawning child processes for bcrypt work added IPC overhead without improving throughput, and sticky sessions required a shared lookup table, reintroducing the original problem.
The breakthrough came when the team questioned the need for a distributed architecture altogether. Instead of scaling the entire system, why not scale just the slowest component? The solution was Verifico, a remote worker service dedicated solely to bcrypt verification. Autentico would remain a single instance, owning the database and handling all other operations. When a password needed verification, the system would send the hash and plaintext to a worker over HTTP. Workers, stateless and lightweight, could run on minimal hardware. The service used round-robin load balancing with automatic fallback to local bcrypt if workers were unavailable.
Security concerns were addressed by using a shared secret over a private network, avoiding the operational overhead of mTLS or the pitfalls of reinventing TLS with AES encryption. The password already traveled over the public internet to reach Autentico; one additional hop inside a VPC added negligible risk.
A Glimpse of Progress—and Its Limits
On the i5 laptop, Verifico delivered tangible improvements. By offloading bcrypt to workers and constraining the server to two cores, non-login endpoints dropped from seconds to single-digit milliseconds. Throughput scaled linearly with the number of workers, peaking at around six cores before flattening. The results were promising, and the team prepared to ship the update.
Yet, when the same benchmarks ran on a modern Ryzen 7 desktop with 16 cores and faster single-thread performance, the story took an unexpected turn. The team constrained Autentico to two cores and added workers in pairs: 2+2, 2+4, up to 2+14. The i5 had shown steady gains with each additional worker, but the Ryzen told a different tale. Throughput remained flat across all configurations, ranging from 15.4 to 14.7 iterations per second, with login p95 latency hovering around 3.6 seconds. Adding workers did nothing.
The Ryzen’s faster cores were chewing through bcrypt hashes so quickly that the function no longer dominated the performance profile. The real bottleneck had shifted elsewhere entirely.
The Hidden Contention Revealed
Profiling on the Ryzen uncovered the unexpected culprit: the Go database connection pool. A Go block profile under load showed that every contention point centered on database/sql.(*DB).conn, where goroutines waited in line for a database connection. Read operations accounted for 65% of the contention, while writes made up the remaining 35%. The top offenders were routine operations like client ID lookups, session creation, and token generation—queries that were individually fast but collectively overwhelmed the pool.
This revelation highlighted a critical insight: SQLite’s default rollback journal mode locks the entire database during writes, blocking all readers. The team’s earlier focus on bcrypt had overshadowed a more fundamental issue—database concurrency. SQLite’s Write-Ahead Logging (WAL) mode, which allows readers and writers to operate simultaneously, promised a straightforward solution.
The Quiet Victory of WAL Mode
Switching to WAL mode transformed Autentico’s performance on the Ryzen. By enabling concurrent reads and writes, the system eliminated the bottleneck in the connection pool. The result was a dramatic reduction in contention, with non-login endpoints now consistently completing in milliseconds. The team had solved the problem not by scaling bcrypt, but by optimizing SQLite’s concurrency model.
This experience underscored a valuable lesson: performance bottlenecks are not always where you expect them to be. What appears to be a CPU-bound crisis may stem from a subtle architectural limitation, and the most effective solutions often lie in rethinking database configurations rather than overhauling infrastructure.
As Autentico moves toward its next release, the team remains focused on refining its approach to concurrency and scalability. The journey from bcrypt bottlenecks to WAL mode optimization serves as a reminder that true performance gains often come from understanding the underlying systems—not just the visible symptoms.
AI summary
Autentico 2.0’nin Go ve SQLite ile geliştirilen OAuth 2.0 sisteminde performans sorunları nasıl çözüldü? Bcrypt darboğazından WAL moduna kadar detaylar burada.
Tags