Why Ships’ GPS Data Broke MongoDB and How TimescaleDB Fixed It

Commercial ships broadcast their identity, position, and speed every few seconds via AIS, creating a continuous stream of 700,000 position reports per hour. For years, VesselAPI relied on MongoDB to store this firehose of data. But as the platform scaled, a single mismatched struct tag and 240 dead tags quietly degraded performance until the entire pipeline required a rewrite. The root problem wasn’t storage—it was time and space itself.

Maritime data isn’t a document to be filed away; it’s a measurement anchored in location and moment. Queries don’t ask for “all records where vessel_id equals 12345.” They ask, “Where was this ship two hours ago?” or “Which vessels entered the English Channel since Tuesday?” MongoDB’s document model handled ingestion early on, but it couldn’t natively answer spatial or temporal questions without workarounds. Indexes, cron jobs, and bolt-on solutions added complexity—and cost.

The Shape of Maritime Data

At its core, AIS data is a point in space and time: latitude, longitude, timestamp, vessel identifier (MMSI), speed, heading, and a handful of metadata fields. Each report arrives every few seconds, creating a high-velocity stream that demands efficient partitioning, compression, and retention policies.

MongoDB treats each position report as a JSON-like document. This flexibility helped during early development, but it obscured the data’s true nature: it’s a time series with spatial attributes. MongoDB’s later addition of time-series collections arrived after VesselAPI had already outgrown document storage.

The platform needed native support for:

Automatic time-based partitioning.
Spatial queries on a sphere (“within 50 km of Rotterdam”).
Retention policies that expire data predictably without cron jobs.
Compression that preserves query speed for recent data.
Relational joins and full-text search.
Operational simplicity without a dedicated DBA.

No single database checked all boxes. The alternatives—InfluxDB, ClickHouse, MongoDB—each excelled in one area but required duct-taping solutions together. That changed when TimescaleDB entered the picture.

Why TimescaleDB Fit the Bill

TimescaleDB is PostgreSQL with a time-series engine baked into the storage layer. Unlike MongoDB, it understands that time isn’t metadata—it’s the primary dimension. Its hypertables automatically split data into chunks by time, compress old data, and enforce retention without external scripts.

The platform’s spatial needs are met by PostGIS, which natively handles spherical geometry and distance calculations. Hexagonal indexing via Uber’s H3 library supports high-performance spatial queries. And because TimescaleDB is PostgreSQL, it inherits decades of relational database engineering: GIN indexes for full-text search, ACID transactions, and mature tooling.

The migration wasn’t just about switching databases—it was about adopting time-series thinking.

Data With a Shelf Life: How Hypertables Work

TimescaleDB’s hypertables abstract away time-based partitioning. A hypertable like vessel_positions looks like a regular PostgreSQL table, but it’s actually a collection of smaller, time-partitioned chunks. Each chunk holds one hour of AIS data by default.

This design transforms retention from a maintenance chore into a declarative policy. Setting a 78-hour retention period doesn’t require scanning millions of rows. TimescaleDB simply drops chunks that fall outside the window—taking milliseconds. With 16.5 million positions and 78-hour retention, the table size remains stable regardless of runtime.

Compression works similarly. After two hours, inactive chunks are compressed automatically. By segmenting compression by MMSI and ordering by timestamp descending, recent queries avoid decompressing entire chunks. This reduces storage and accelerates time-range lookups because the query planner skips irrelevant chunks entirely.

ALTER TABLE vessel_positions SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'mmsi',
    timescaledb.compress_orderby = 'timestamp DESC'
);

SELECT add_compression_policy('vessel_positions', INTERVAL '2 hours');

In MongoDB, retention relied on a cron job running a cleanup script every few hours—a fragile process that once failed silently for a week before disk usage triggered an alert. In TimescaleDB, retention is policy, not procedure.

Operational Simplicity: One Database, No Duct Tape

The migration to TimescaleDB eliminated the need for multiple systems stitched together with scripts and ETL pipelines. Instead of managing MongoDB for ingestion, InfluxDB for time-series, and PostGIS for spatial queries, the platform now runs on a single, unified stack.

Operational overhead dropped. Performance improved. Costs fell by roughly 70% compared to the MongoDB setup—even as data volume tripled. Spatial queries for “vessels within 50 km of Rotterdam” now complete in milliseconds, not seconds.

The mismatched struct tag that triggered the rewrite? It never caused a crash. It caused something worse: a silent degradation of data integrity and performance, invisible until the system couldn’t keep up. TimescaleDB didn’t fix a bug—it exposed the mismatch between data model and reality.

Looking ahead, VesselAPI plans to expand beyond positional tracking into predictive analytics and anomaly detection. TimescaleDB’s continuous aggregates and real-time compression will support these workloads without adding infrastructure. The firehose of maritime data isn’t slowing down. But with the right database, it no longer has to shout into the void.

AI summary

Gemi konum verilerini MongoDB'den TimescaleDB'ye taşıyan bir platformun deneyimi. Zaman serileri, hiper tablolar ve otomatik yönetim avantajlarını keşfedin.

Why Ships’ GPS Data Broke MongoDB and How TimescaleDB Fixed It

The Shape of Maritime Data

Why TimescaleDB Fit the Bill

Data With a Shelf Life: How Hypertables Work

Operational Simplicity: One Database, No Duct Tape

Comments

Run Debian apps natively on Android 15 without PRoot

Spectrion’s Bug Bounty Mode: Structured Security Research for Researchers

Master HTTP Status Codes for Cleaner, More Reliable APIs