Akool Unveils Video Inference Engine That Cuts Generation to Seconds and Streams AI Frames in Under 30ms
Palo Alto-based Akool rebuilt its full stack for a 10-20x speedup, pushing generative video clips to 1-3 seconds and live streaming below 30ms per frame.
Akool, the Palo Alto-based generative video startup behind Akool Live Camera, on May 11 unveiled what it is calling a production-grade AI video inference engine that delivers a 10 to 20 times speedup over conventional approaches and, for the first time, makes interactive AI video viable in real time. The company says the rewrite touches every layer of its system, from algorithm design through to GPU execution, and is already powering the products that customers use today.
The headline numbers are striking for a domain that, until recently, was measured in tens of seconds per clip. Akool says its new engine generates a full video in just one to three seconds, and for live use it streams frames at sub-30-millisecond per-frame latency, the threshold at which an AI-driven feed begins to feel indistinguishable from a normal webcam. Behind that performance is a combination of reduced computational steps, far more aggressive parallelization across GPUs, the elimination of runtime overhead between stages, and tight optimization for next-generation accelerators.
"This is the moment AI video becomes truly usable," said Akool co-founder and CEO Jiajun Lu in the company's announcement. "We rebuilt the entire stack." Lu argues that batch-style generative video, while impressive in demos, has never been latency-tight enough to support the experiences enterprise customers actually want, things like real-time digital avatars on a sales call, live translation that does not break a conversation, or interactive marketing video that responds to each viewer.
The engine is already integrated into Akool Live Camera, which uses it to drive real-time digital avatars, live translation between languages, and interactive video experiences for enterprise customers. Akool says the same infrastructure runs across cloud deployments, real-time streaming systems, and on-device targets, with production features such as automated quality controls, staged deployments, real-time monitoring, and per-model cost tracking baked in. That mix of latency and operational tooling is aimed squarely at the marketing, customer-experience, and live-events buyers that have been the early commercial market for generative video.
The launch arrives at a moment when the rest of the generative video field, including OpenAI's Sora, Runway, Pika and Google's Veo, is also racing toward lower-latency and longer-context outputs. Akool's pitch is that infrastructure, not model quality alone, is the binding constraint on real-time use cases, and that being first with second-scale generation and sub-30-ms streaming is enough to define a new category of "live AI video" before competitors close the gap.