TOON format: a compact alternative to JSON for LLM prompts

Large language models often face a hidden bottleneck: the repetitive JSON structure that surrounds data in API calls. Enter TOON, a lightweight format designed to streamline prompt engineering by declaring schema once and streaming values efficiently.

This innovation arrives at a critical moment for AI pipelines. As retrieval-augmented generation (RAG) systems scale, the token cost of formatting retrieved chunks becomes a measurable expense. TOON addresses this by stripping away redundant syntax while preserving the logical structure that models expect.

Why TOON matters in AI workflows ⚡

In most LLM applications, the real constraint isn’t storage capacity—it’s the context window. Every token used for formatting is a token that could carry meaningful context. Traditional JSON forces developers to repeat keys, braces, and quotes across every object in an array, which adds up quickly when retrieving multiple chunks.

Consider a typical RAG scenario: a system fetches five document snippets, each including fields like document ID, section title, relevance score, and text. In JSON, this might require hundreds of tokens just for structural overhead before the actual content appears. TOON eliminates this waste by treating the schema as a one-time declaration and focusing the payload on raw values.

Core concepts behind TOON 🧱

At its heart, TOON combines three familiar concepts into a single compact format:

JSON-style data modeling: maintains familiar object relationships and field types
YAML-inspired readability: uses indentation and simple syntax for human comprehension
CSV-like tabular format: optimizes uniform arrays of objects with primitive values

The result is a format that feels like JSON’s logical cousin but behaves more like an efficient spreadsheet. Rather than repeating structure across every object, TOON declares it once and lets the data flow freely.

The three pillars of TOON syntax 🏗️

A TOON payload follows a straightforward pattern that repeats across different use cases:

Array length notation: [N] declares upfront how many objects follow, which helps models verify completeness
Field declaration: {field1,field2,...} defines the column order and types once
Data rows: follow the declared field sequence with comma-separated values

This structure creates predictable patterns that both developers and models can parse reliably. The [N] component, in particular, serves a dual purpose: it optimizes token usage while providing built-in validation for truncated outputs.

Performance gains in practice 📊

For uniform arrays of similarly structured objects—a common pattern in RAG—TOON typically achieves 30-60% token reduction compared to equivalent JSON. The savings come from eliminating repeated structural elements that add no semantic value:

No repeated field names across objects
No redundant quotation marks around strings
No unnecessary braces or commas
Simplified array delimiters

These savings become particularly valuable when working within strict context window limits. A RAG system that previously fit 10 chunks in JSON might suddenly accommodate 15-20 chunks in TOON without exceeding token budgets.

When TOON shines—and when it doesn’t ⚖️

TOON excels in specific scenarios where its design assumptions align with requirements:

Ideal use cases:
Uniform arrays of primitive values in LLM prompts
RAG retrieval outputs with repeated metadata fields
Agent tool outputs with structured results
Any situation where JSON’s verbosity becomes costly

Poor fit scenarios:
Non-tabular, deeply nested JSON structures
Binary data storage requirements
High-performance in-memory processing
Legacy system integrations requiring strict JSON compatibility

The key distinction lies in TOON’s design purpose: it’s optimized for the last mile of data delivery to language models, not for long-term storage or complex analytical processing.

A concrete RAG comparison 🔍

Consider a real-world example where a RAG system retrieves policy documents:

JSON version (2 chunks):

[
  {"chunk_id": 101, "doc": "policy.pdf", "section": "refunds", "score": 0.93, "text": "Customers can request refunds within 30 days..."},
  {"chunk_id": 205, "doc": "policy.pdf", "section": "cancellations", "score": 0.90, "text": "Cancellation fees apply after processing..."}
]

TOON version:

chunks[2]{chunk_id,doc,section,score,text}: 101,policy.pdf,refunds,0.93,"Customers can request refunds within 30 days..." 205,policy.pdf,cancellations,0.90,"Cancellation fees apply after processing..."

The information remains identical, but the structural overhead drops dramatically. This isn’t about changing underlying data—it’s about presenting the same data more efficiently to the model.

Integration strategy for existing pipelines 🔧

Implementing TOON doesn’t require rebuilding entire architectures. Most teams can adopt it gradually:

Storage layer: Continue using Parquet, SQLite, or your preferred database for raw data
Retrieval phase: Convert retrieved chunks to TOON format before LLM input
Validation layer: Use the [N] declaration to verify complete data transmission
Fallback mechanism: Maintain JSON compatibility for edge cases requiring strict standards

This separation of concerns allows teams to optimize their most critical bottleneck—the prompt context window—without disrupting existing workflows.

Looking ahead at TOON’s role 🔮

As language models grow more capable, the formats we use to communicate with them must evolve. TOON represents a pragmatic step forward by acknowledging that not all JSON’s features are necessary for prompt engineering. The format’s simplicity and efficiency suggest it could become a standard tool in the RAG developer’s toolkit, particularly as token pricing models continue to influence architectural decisions.

The next frontier may involve even more specialized formats, but TOON’s core insight—declaring structure once and focusing on data values—is likely to endure. For teams struggling with prompt bloat, adopting TOON could mean squeezing more context into existing windows or reducing API costs without sacrificing accuracy.

AI summary

TOON formatı ile JSON’a göre %30-60 daha az token harcayarak yapay zeka modellerine veri aktarın. TOON’un yapısı, avantajları ve kullanım alanları hakkında detaylı bilgiler.