Mira Murati's Thinking Machines Unveils TML-Interaction-Small, a Full-Duplex Voice Model That Beats GPT and Gemini
Models·2 min read·TechCrunch

Mira Murati's Thinking Machines Unveils TML-Interaction-Small, a Full-Duplex Voice Model That Beats GPT and Gemini

Mira Murati's Thinking Machines Lab released a research preview of TML-Interaction-Small, a 276B mixture-of-experts model that responds in under 0.4 seconds and listens while it speaks — outpacing GPT-realtime-2 and Gemini 3.1 Flash Live on the FD-bench latency test.

Share:

Thinking Machines Lab, the AI startup founded last year by former OpenAI CTO Mira Murati, on Monday pulled back the curtain on its first frontier model: TML-Interaction-Small, a 276-billion-parameter mixture-of-experts system built explicitly for real-time, human-like conversation. Unlike conventional large language models that process input and then generate a reply in distinct turns, TML-Interaction-Small uses a so-called full-duplex architecture, slicing every exchange into 200-millisecond micro-turns so the model can listen, watch, think and speak simultaneously.

The lab is publishing the research as a preview rather than a general release. According to Thinking Machines, the model achieves a turn-taking latency of under 0.40 seconds on the company's internal FD-bench evaluation — roughly the cadence of a natural phone call. By comparison, Google's Gemini 3.1 Flash Live registered 0.57 seconds and OpenAI's recently launched GPT-Realtime-2 came in at 1.18 seconds. The architecture pairs a small, always-on Interaction Model that maintains the live dialogue with a heavier Background Model that takes over for sustained reasoning, web browsing or tool calls, then hands results back to the front-end model mid-conversation.

To shave milliseconds, Thinking Machines abandoned the traditional approach of routing raw audio and video through bulky external encoders. The system instead uses what the company calls encoder-free early fusion, processing raw signals through a lightweight embedding layer that is jointly trained with the rest of the network. The technique also lets the model dispense with the standard voice-activity-detection module that older real-time stacks rely on to decide when a user has finished speaking, a piece of plumbing that frequently truncates utterances or leaves awkward pauses.

For now, TML-Interaction-Small is available only to a small set of research partners during the preview phase, with a broader public rollout slated for later this year. Thinking Machines has not disclosed pricing, partner names, or which products will ship with the model first. But the launch represents Murati's most concrete bet yet on a thesis she has telegraphed since founding the lab — that the next leap in AI capability will come not from bigger pre-training runs, but from systems that can perceive and respond on the same timescale as the humans they are meant to collaborate with.

Related Articles