Enterprises have long sought AI that can interpret video in real time—whether to monitor facilities, edit marketing content, or analyze body language in interviews. While such capabilities exist today, they remain out of reach for most due to prohibitive costs.
That’s changing with the launch of Perceptron Mk1, a proprietary video reasoning model from Perceptron Inc. that delivers advanced physical-world understanding at a fraction of the price of industry leaders. The model is priced at $0.15 per million input tokens and $1.50 per million output tokens via its API, positioning it 80-90% cheaper than Anthropic’s Claude Sonnet 4.5, OpenAI’s GPT-5, and Google’s Gemini 3.1 Pro.
The breakthrough comes after 16 months of development led by CEO Armen Aghajanyan, a former Meta FAIR and Microsoft researcher. Unlike traditional vision-language models that process video as a series of static images, Mk1 is engineered for temporal continuity, enabling it to track objects through occlusions and interpret physical interactions with precision.
A new benchmark in spatial and video reasoning
Perceptron’s claims are backed by rigorous third-party evaluations. On spatial reasoning benchmarks, Mk1 outperformed competitors in critical tests:
- EmbSpatialBench (85.1): Surpassed Google’s Robotics-ER 1.5 (78.4) and Alibaba’s Q3.5-27B (~84.5).
- RefSpatialBench (72.4): Dramatically exceeded GPT-5m (9.0) and Sonnet 4.5 (2.2), proving superior capability in referring expression comprehension.
In video analysis, Mk1 matched top-tier models in challenging scenarios where simple frame comparisons fail:
- EgoSchema “Hard Subset” (41.4): Tied Alibaba’s Q3.5-27B and outperformed Gemini 3.1 Flash-Lite (25.0), showcasing deep temporal reasoning.
- VSI-Bench (88.5): Achieved the highest score among tested models, validating its real-world applicability.
The results underscore Mk1’s ability to interpret cause-and-effect relationships—such as determining whether a basketball was released before a shot clock expired—using joint reasoning over motion and environmental context.
Efficiency frontier: performance without premium pricing
Perceptron positions Mk1 at the Efficiency Frontier, a metric balancing reasoning accuracy against token costs. While models like GPT-5 and Gemini 3.1 Pro command blended rates near $2.00 and $3.00 per million tokens respectively, Mk1 delivers comparable or superior performance at just $0.30 per million tokens.
This pricing strategy aims to democratize high-end physical AI, making it viable for industrial deployment rather than confining it to academic research. Perceptron’s goal is clear: enable large-scale applications where real-time video understanding was previously cost-prohibitive.
From pixels to physics: how Mk1 deciphers the physical world
At its core, Mk1 processes native video at up to 2 frames per second within a 32K token context window, ensuring continuity across extended streams. This allows the model to:
- Maintain object identity even under occlusion—a critical feature for robotics and surveillance.
- Return structured time codes for precise event detection and video clipping.
- Interpret analog instruments like gauges and clocks with pixel-level accuracy.
The model’s physical reasoning extends to counting objects in dense scenes (up to hundreds) and analyzing historical footage. In testing, Mk1 correctly identified an early 1900s New York City skyscraper construction film from the Library of Congress, describing atypical construction methods—such as workers suspended on ropes—and inferring the era from visual cues alone.
A scalable developer platform for real-world applications
Perceptron isn’t just releasing a model—it’s launching an end-to-end platform to simplify integration. Developers can tap into Mk1’s capabilities with minimal code, enabling rapid prototyping for use cases like:
- Automated security monitoring with contextual alerts.
- Social media video repurposing via event detection.
- Behavioral analysis in controlled studies or hiring processes.
The platform emphasizes usability, ensuring that even teams without deep AI expertise can deploy advanced video reasoning. Perceptron also provides a public demo for hands-on evaluation before committing to enterprise contracts.
The future of accessible physical AI
With Mk1, Perceptron is challenging the assumption that cutting-edge video reasoning must come at a premium. By combining temporal continuity, physical reasoning, and cost efficiency, the model bridges the gap between research and real-world deployment.
As industries from manufacturing to healthcare seek smarter ways to interpret visual data, Perceptron’s approach could redefine the accessibility of AI that understands not just what it sees—but why it happens.
AI summary
Perceptron’un yeni Mk1 modeli, video analizinde lider AI’lara göre %80-90 daha ucuz fiyatla fiziksel dünyayı anlama yeteneği sunuyor. Performans ve fiyat avantajını keşfedin.
