ZATRON’s encrypted search resists neural network attacks

A neural network can expose hidden patterns in encrypted data—unless the encryption is designed to stop them. That’s the takeaway from a recent experiment where a developer built an adversarial model to test their own privacy-preserving search system, ZATRON. The goal wasn’t just to claim security through obscurity; it was to stress-test the system against the most determined attacker possible: a neural network trained specifically to break it.

From vector embeddings to modular barcodes

Traditional semantic search stores document embeddings as plain numerical vectors. While efficient for matching queries to results, these vectors also leak sensitive information. If an attacker gains access to the database, they can cluster embeddings by topic or reconstruct document relationships without ever reading the actual content. ZATRON addresses this by transforming each embedding into a modular barcode—a cryptographic representation that preserves search accuracy while masking the original data.

The process involves projecting embeddings onto principal component analysis (PCA) channels, quantizing the values, applying a document-specific keyed mask, and storing only the residues modulo a set of prime numbers. Search operations are performed directly on these barcodes, eliminating the need to reconstruct the original embeddings. Despite the transformation, retrieval quality remains high, with 98% of cosine similarity preserved across 626,000 MSMARCO test passages. The critical question was whether the barcodes inadvertently leaked information about the underlying data.

Why correlation tests aren’t enough

Initial security checks relied on statistical correlations, such as Spearman’s rank coefficient between barcode distance and true document similarity. The results were reassuring—ρ hovered around 0.05, suggesting no linear relationship. But correlation tests only rule out simple attacks. Neural networks don’t need linearity. They can uncover complex, non-linear patterns that traditional metrics miss.

To close this gap, the developer designed a far more aggressive threat model: a neural network with full visibility into the system. The attacker wasn’t just an opportunist skimming the database—it was a model trained on 80,000 document pairs, each labeled with their true cosine similarity. The network, a linear probe and a 3-layer MLP, was tasked with predicting similarity scores for unseen pairs using per-prime circular-difference features. Crucially, no training and testing pairs shared anchor documents, preventing the model from memorizing specific examples.

A control group used unprotected, quantized embeddings to validate the attack’s strength. If the neural network couldn’t break the unprotected signals, the test would be meaningless.

Neural networks fail to crack ZATRON’s barcodes

The results were unequivocal. On a test set of 50,000 MSMARCO passages with 100,000 labeled pairs:

The unprotected signals fell to the neural network’s attack almost perfectly, achieving ρ = 0.79 and an AUC of 0.985 with the linear probe, and ρ = 0.90 with an AUC of 0.999 using the MLP.
ZATRON’s barcodes, however, yielded exactly chance-level performance. The linear probe and MLP both recorded ρ = 0.00 and AUC = 0.50, indistinguishable from random guessing.

The same model that extracted sensitive relationships from raw embeddings couldn’t derive anything meaningful from the barcodes—even after training on tens of thousands of labeled examples. The system held up precisely where it mattered most.

Head-to-head with ASPE: retrieval vs. privacy

To contextualize the findings, the developer compared ZATRON against ASPE, a classic encrypted k-nearest neighbors (kNN) scheme from SIGMOD 2009. ASPE preserves scalar products exactly, ensuring perfect retrieval recall. But that precision comes at a steep privacy cost: any observer can infer document similarities directly from the ciphertexts. Published results show ρ = +0.87 for ASPE, meaning the encrypted data leaks similarity relationships almost as clearly as plaintext.

ZATRON, by contrast, prioritizes privacy over perfect recall. On the strictest retrieval metric—recall@10 with full top-10 set overlap—ZATRON achieves 81% compared to ASPE’s 100%. Yet when tested with the same neural attack, ASPE’s leaked scalar products allowed the model to predict similarities with ρ = +0.91 and an AUC of 0.99. ZATRON’s barcodes produced ρ = +0.01 and AUC = 0.52, again indistinguishable from randomness.

What ZATRON does—and doesn’t—guarantee

Honesty about limitations is part of the system’s design. The current threat model assumes an observer who can access stored barcodes but lacks the secret keys used during encoding. A key holder performing many pairwise comparisons can still reconstruct partial geometric relationships through multi-dimensional scaling (MDS), though the correlation drops to ρ ≈ 0.35. This is an inherent trade-off for any distance-preserving privacy scheme, including fully homomorphic encryption (FHE).

ZATRON is also not a reversible cipher, nor has it undergone independent cryptographic audits. That’s intentional—it’s a randomized privacy-preserving encoding, not a mathematically proven encryption standard. The developer stresses that a production-grade system would require rigorous third-party review before adoption.

Finally, the retrieval metric used here is stricter than previous benchmarks. Earlier figures cited top-1-in-top-10 accuracy, while this test enforces full overlap of the top-10 results. The same system, evaluated under a tougher standard.

Try it yourself—or break it

The experiment’s reproducibility is key to its credibility. All code, benchmarks, and attack scripts are available in the public repository. The developer invites others to attempt the neural attack, train longer, or refine the model’s features. If anyone can make the system leak information, they’re encouraged to submit a pull request or open an issue.

Install ZATRON: pip install zatron
Explore benchmarks and attack scripts in the benchmarks/ directory
Try the live demo on Hugging Face Spaces

Until then, the evidence suggests that ZATRON’s modular barcodes successfully obscure the underlying data—even from machines trained to find hidden patterns.

AI summary

Veri tabanlarında anlamsal arama yaparken gizliliğinizi korumanın yeni yolu: ZATRON sistemi nasıl çalışır, güvenlik testi sonuçları nelerdir?

ZATRON’s encrypted search resists neural network attacks

From vector embeddings to modular barcodes

Why correlation tests aren’t enough

Neural networks fail to crack ZATRON’s barcodes

Head-to-head with ASPE: retrieval vs. privacy

What ZATRON does—and doesn’t—guarantee

Try it yourself—or break it

Comments

Why memory outperforms full context for long agent conversations

Eidentic: Build AI agents with self-improving memory and built-in production tools

Master TypeScript Types to Write Cleaner, Safer JavaScript