Hamming AI reposted this
Voice AI's "demo to deployment" gap collapsed this week. Every layer of the stack shipped something material: TTS models claimed new leaderboards, agent platforms went self-serve, OpenAI and Google rolled out new realtime models, and NVIDIA shipped the first CPU built for agents. Here's what stood out: Cartesia dropped Sonic-3.5 and took #1 on Artificial Analysis's Speech Arena leaderboard in both global and open-weight rankings. PolyAI opened its Agentic Dialog Platform to every builder. The platform behind Marriott, FedEx, UniCredit, and Caesars is now free for the first two months, with Raven (trained on 1B+ enterprise conversations) as the default model. Rime shipped Coda, the next iteration of its TTS flagship. Dual-decoder architecture, on-prem deployable, and a serving stack (TIGERSTRIPE) that outperforms off-the-shelf implementations. Google I/O announced Gemini 3.5 Flash, Gemini Spark (24/7 agentic assistant), and voice features in Docs, Keep, and Gmail. Voice is now a default Workspace input, not a feature add-on. AssemblyAI released its Voice Agent API at $4.50/hr flat and announced Universal-3 Pro upgrades: 19% better WER on multilingual, 30% faster median latency. Modulate.ai released Velma Deepfake Detect at $0.25/hr, over 100x cheaper than competing solutions. #1 on Hugging Face's Speech Deepfake Arena. Deepfakes drove $1.6B in losses in 2025. NVIDIA delivered the first Vera CPUs (their first custom CPU, built for agentic workloads) to Anthropic, OpenAI, SpaceXAI, and Oracle. OCI plans to deploy hundreds of thousands this year. The pattern is clear: voice is no longer a single product category. It's a layered stack, and every layer just raised its bar.