The way we interact with artificial intelligence is about to change fundamentally. For years, AI systems have operated in a rigid turn-based format: you speak or type, wait for the model to process your input, and then receive a response. But this approach creates friction in real-world scenarios where seamless, real-time interaction matters. Thinking Machines, a cutting-edge AI startup co-founded by former OpenAI CTO Mira Murati, is challenging this status quo with a new class of models designed to engage in natural, fluid conversations—almost as if you were talking to another person.
AI moves beyond the 'turn-based' conversation trap
Traditional AI models function in a linear sequence: they wait for your input to complete before processing it and generating a response. This creates a bottleneck where users must adapt their communication style to the AI's limitations, often phrasing questions as if writing emails or batching thoughts into single prompts. It’s a system that forces humans to contort their natural way of interacting.
Thinking Machines aims to eliminate this friction with its interaction models, a breakthrough architecture that treats real-time dialogue as a core feature rather than an afterthought. The company describes this shift as moving from "turn-based" to "full-duplex" interaction—where the AI listens, processes, and responds simultaneously, much like a human interlocutor. This approach isn’t just about speed; it’s about enabling AI to participate in conversations dynamically, picking up on visual cues, interjecting at the right moment, and maintaining a continuous presence.
How the technology works: A dual-model system for real-time collaboration
At the core of this innovation is a dual-model architecture that separates immediate interaction from deep reasoning. The system consists of two components:
- Interaction Model: This model handles real-time dialogue, recognizing speech, video, and text inputs in 200-millisecond micro-turns. It processes raw audio as dMel signals and visual data as 40x40 image patches, fusing these inputs early in the pipeline rather than relying on separate encoders. The result is an AI that can backchannel while you speak, notice visual cues like errors in code, or react proactively when someone enters a video frame.
- Background Model: This asynchronous agent focuses on complex tasks such as web browsing, tool calls, or sustained reasoning. It operates independently, streaming results back to the interaction model, which then integrates them naturally into the conversation. For example, the AI could translate a conversation in real time while simultaneously generating a bar chart based on the discussion—all without interrupting the flow of dialogue.
This separation allows the system to achieve human-like response times while maintaining the depth of reasoning required for sophisticated tasks. In demonstrations, the model demonstrated the ability to react to visual and auditory cues with typical human latency, showcasing a level of responsiveness previously unseen in AI systems.
Benchmark breakthroughs: Outperforming rivals in real-time interaction
To validate its approach, Thinking Machines introduced FD-bench, a specialized benchmark designed to measure the quality of real-time interactions rather than just raw computational power. The results are striking:
- Turn-taking latency: The company’s TML-Interaction-Small model achieved a turn-taking latency of 0.40 seconds, significantly faster than competitors like Gemini-3.1-flash-live (0.57s) and GPT-realtime-2.0 (1.18s).
- Interaction Quality: On FD-bench V1.5, TML-Interaction-Small scored 77.8, nearly doubling the performance of GPT-realtime-2.0 (46.8) and surpassing Gemini-3.1-flash-live (54.3).
- Visual Proactivity: In tests like RepCount-A (counting physical repetitions in video) and ProactiveVideoQA, Thinking Machines’ model engaged with visual inputs far more effectively than leading frontier models, which often remained silent or provided incorrect responses.
The model also excelled in IFEval (VoiceBench) with a score of 82.1, outperforming GPT-realtime-2.0 (81.7) and Gemini-3.1-flash-live (67.6). Refusal rates, a critical metric for usability, were impressively low at 99.0%, indicating the model’s ability to handle diverse inputs without unnecessary interruptions.
The road ahead: From research preview to enterprise transformation
While the current release is a research preview, Thinking Machines plans to open a limited research access in the coming months, with a broader commercial release expected later this year. The implications for enterprises are substantial. A native interaction model like TML-Interaction-Small could revolutionize customer service, where real-time, natural dialogue is essential. It could also enhance collaborative tools, enabling AI to assist in meetings by summarizing discussions, generating visualizations, or translating conversations on the fly.
The potential extends beyond text-based interactions. In industries like healthcare, real-time AI could assist in patient consultations, interpreting visual and auditory cues to provide timely responses. In coding, the model could flag errors in real time as developers write, offering corrections without waiting for a full input submission. The shift from turn-based to full-duplex interaction isn’t just an incremental improvement—it’s a paradigm change that could redefine how humans and AI collaborate.
The next phase for Thinking Machines will be critical: refining the model based on feedback from researchers and early adopters, and ensuring it scales effectively for real-world applications. If successful, this technology could mark the beginning of a new era in AI interaction—one where machines don’t just respond, but truly converse.
AI summary
Thinking Machines, yeni 'etkileşim modelleri' ile gerçek zamanlı AI ses ve video konuşmaları için bir ön gösterim sunuyor.
