iToverDose/Startups· 6 MAY 2026 · 00:02

Startup claims 1,000x AI efficiency leap with linear-scaling model

A Miami-based startup introduced a large language model that processes long contexts 1,000 times faster than today’s leading systems. The claim hinges on a linear scaling architecture that replaces quadratic attention, but independent validation remains pending.

VentureBeat3 min read0 Comments

Subquadratic, a stealth-mode startup based in Miami, has introduced what it calls the first large language model built on a fully linear-scaling architecture. The company’s inaugural model, SubQ 1M-Preview, processes up to 12 million tokens while reducing attention computation by roughly 1,000 times compared to conventional systems, according to its internal benchmarks. While the startup has not yet undergone third-party verification, its technical approach challenges a decades-old assumption that has shaped the entire AI industry.

The quadratic barrier that shaped the economics of AI

Most transformer-based models today rely on attention mechanisms that compare every token with every other token. This operation grows quadratically with input length, meaning doubling the context size quadruples the compute cost. Since the 2017 debut of the Transformer architecture, researchers and engineers have designed elaborate workarounds—retrieval-augmented generation, prompt chaining, multi-agent orchestration—to route around this fundamental limit.

The workaround economy has become standard practice. Developers split long documents into chunks, pre-filter relevant passages, or limit context windows to 128,000 tokens to avoid prohibitive costs. Even frontier models like Anthropic’s Claude Sonnet 4.7 and Google’s Gemini 3.1 Pro cap inputs at 1 million tokens, trading full-context reasoning for affordability.

Subquadratic’s co-founder and CTO Alexander Whedon argues that these workarounds are costly in both compute and human effort. “I used to spend weeks manually curating prompts, retrieval pipelines, and conditional logic,” he said. “That approach wastes both computational resources and human creativity.”

A deceptively simple fix: compute only what matters

The company’s solution, called Subquadratic Sparse Attention (SSA), abandons the assumption that every token needs to be compared to every other token. Instead, SSA learns to identify which comparisons actually contribute value and computes attention only over those positions. The selection is dynamic: the model decides where to look based on semantic relevance, not fixed positional patterns.

This approach flips the scaling equation on its head. According to Subquadratic’s technical blog, SSA delivers a 7.2x speedup over dense attention at 128,000 tokens and a 52.2x speedup at 1 million tokens. “If input size doubles with quadratic scaling, you need four times the compute,” Whedon explained. “With linear scaling, doubling input size requires only twice the compute.”

The model was trained in three stages: pretraining on large text corpora, supervised fine-tuning for instruction following, and reinforcement learning focused on long-context retrieval. This final stage specifically targeted a subtle failure mode in existing systems—over-reliance on nearby context—which often degrades performance on tasks requiring distant information.

Benchmarks look impressive, but validation remains pending

Subquadratic’s published results show competitive or superior performance on standard benchmarks. On SWE-Bench Verified, an evaluation of code generation tasks, the model scored 81.8%, outperforming Anthropic’s Opus 4.6 at 80.8% and DeepSeek 4.0. On other benchmarks such as GPQA Diamond and LongBench, the model reportedly matches or exceeds larger, better-funded systems.

However, the absence of independent audits has fueled skepticism in the research community. Critics point to the lack of peer-reviewed architecture papers, unreleased training data, and undisclosed hardware configurations. Without transparent replication, industry observers caution against treating the claims as settled science.

From stealth to product: three early-beta tools

To demonstrate practical value, Subquadratic is launching three private-beta tools built on SubQ 1M-Preview. The first is an API that exposes the full 12-million-token context window, enabling developers to feed entire documents or repositories without chunking. The second is SubQ Code, a command-line coding agent designed for large-scale codebase analysis and generation. The third is SubQ Search, a long-context retrieval tool aimed at enterprise knowledge bases and legal or medical document analysis.

The company has raised $29 million in seed funding, valuing it at $500 million. Investors include Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early backers of Anthropic, OpenAI, Stripe, and Brex. Despite the capital and ambition, the startup’s long-term credibility hinges on one question: Can its linear-scaling claim withstand rigorous, independent scrutiny? If so, it may redefine the economics of AI at scale.

AI summary

Miami merkezli Subquadratic, 12 milyon token kullanarak dikkat hesaplama maliyetini 1000 kata kadar azalttığını iddia eden SubQ modeliyle yapay zekada devrim yaratma iddiasında. Detayları ve araştırmacıların tepkileri burada.

Comments

00
LEAVE A COMMENT
ID #WACA0E

0 / 1200 CHARACTERS

Human check

3 + 6 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.