The open-source community just gained a powerful new tool for generating realistic mock data. Meet Mockyard, a Docker-based solution that rivals commercial platforms like Mockaroo while remaining completely free and self-hostable.
Mockyard eliminates the frustrations developers face with traditional mock data generators. Unlike Mockaroo’s restrictive free tier—limited to 1,000 rows per file and a $60/year upgrade for 100,000 rows—Mockyard lets you generate millions of records daily without cost. It’s also entirely open-source, so you can inspect, modify, or extend the code to suit your needs.
From frustration to a full-scale solution
The creator of Mockyard built it to solve two critical problems. First, modern AI tools have made complex development projects more accessible, but existing mock data generators often required cumbersome installations or online dependencies. Second, testing data pipelines with hundreds of thousands—or even millions—of records demanded a fast, memory-efficient solution that didn’t rely on online services.
"I needed something fast, lightweight, and easy to deploy," the developer explained. "Something that wouldn’t force me to install multiple languages or tools just to generate a few CSV files."
The result? A Docker container that spins up in seconds and handles massive datasets with ease. Running Mockyard costs nothing, even for files with up to 10 million rows—something Mockaroo reserves for its paid plans.
Key features that set Mockyard apart
Mockyard isn’t just another mock data generator. It introduces several unique capabilities that address real-world pain points in data testing and development.
Weighted distributions for realistic data
Traditional tools often generate uniform data, leading to unrealistic distributions. Mockyard allows you to define weighted enums, ensuring data reflects real-world patterns. For example:
- 20% of records assigned the
Adminrole - 30% as
Manager - 50% as
Viewer
This approach mirrors actual user role distributions in applications, making test data far more valuable for debugging and validation.
Logical field relationships with lookup tables
Ever seen a CSV with entries like "Miami, Yukon Territory, Switzerland"? Mockaroo and similar tools often generate fields independently, producing nonsensical combinations. Mockyard solves this with lookup tables that keep related fields logically connected.
While users must define the lookup values themselves, the tool ensures consistency. If a city is selected, the corresponding state and country match real-world pairings. This small but critical feature drastically improves data realism for testing scenarios.
Blazing-fast performance for large datasets
Performance matters when generating millions of records. Mockyard’s benchmarks demonstrate its efficiency:
Rows Format Time Throughput (rows/sec)
1,000 CSV 0.02s ~50,000
10,000 CSV 0.09s ~111,111
100,000 CSV 0.53s ~188,679
1,000,000 CSV 4.89s ~204,499
10,000,000 CSV 53.61s ~186,532The tool currently caps generation at 10 million rows, a practical limit given that most applications—especially those using tools like Excel—struggle to handle files beyond one million rows on standard hardware. For most use cases, this performance ceiling is more than sufficient.
Flexible access: web interface or direct API
Mockyard offers two ways to generate data, catering to different workflows. The built-in web interface provides an intuitive way to configure and download mock datasets. Behind the scenes, it uses the same API endpoints that power the UI, allowing for programmatic access.
This dual approach ensures compatibility with both manual testing and automated pipelines. Developers can integrate Mockyard into their CI/CD processes or use it interactively without writing a single line of code.
Current limitations and future possibilities
Today, Mockyard supports CSV and JSON output formats—choices driven by the creator’s immediate needs. The developer acknowledges that additional formats could be useful but hasn’t prioritized them yet.
"If people find this tool valuable and request support for other formats like SQL or XML, I’ll definitely consider adding them," they noted. "Community feedback will guide future development."
The project lives on GitHub, where developers can explore the code, submit issues, or contribute enhancements. With its focus on speed, realism, and flexibility, Mockyard stands poised to become a go-to solution for developers tired of the limitations in traditional mock data generators.
As data-driven applications grow more complex, tools like Mockyard fill a critical gap—one that no longer requires sacrificing functionality for affordability.
AI summary
Mockyard, Mockaroo’nun ücretsiz versiyonuna kıyasla milyonlarca satırlık veri üretebilen, kendi sunucunuza kurabileceğiniz açık kaynaklı bir araçtır. Performansı ve esnekliğiyle dikkat çekiyor.