ZONOS2: The First Open-Source MoE TTS Model and the Production Problem It Creates

Zyphra's ZONOS2 is the first open-source mixture-of-experts (MoE) text-to-speech model released under the Apache 2.0 license, with 8 billion parameters and two distinct generation modes: Stable and Expressive. It is a genuine step forward for open-source TTS, and it introduces a routing decision that most production teams are not prepared to make at scale.
The Two-Mode Routing Problem
ZONOS2 ships two modes. Stable mode produces consistent, low-variance output — correct for narration, e-learning, IVR. Expressive mode produces higher emotional variance — right for drama and character dialogue, but harder to validate at scale. Most production content pipelines need both. Routing between them per content type, validating against mode-specific quality criteria, and managing retries differently per mode is a three-layer orchestration problem, not a one-time model selection decision.
How Onepin Addresses It
Onepin connects to ZONOS2 alongside 100+ other TTS models including ElevenLabs, Cartesia, Deepgram Aura-2, and Rime AI. You define which content types route to ZONOS2 Stable, which route to Expressive, and what validation thresholds apply. Onepin runs the jobs, validates, retries, and delivers publish-ready audio. When a better open-source model releases, you add a routing rule — not rebuild your integration. Get started with the Python SDK via pip install onepin, or read the Onepin documentation. onepin.ai