pith. machine review for the scientific record. sign in

arxiv: 2602.11298 · v3 · submitted 2026-02-11 · 💻 cs.AI

Recognition: unknown

Voxtral Realtime

Mistral-AI: Alexander H. Liu , Andy Ehrenberg , Andy Lo , Chen-Yo Sun , Guillaume Lample , Jean-Malo Delignon , Khyathi Raghavi Chandu , Patrick von Platen
show 159 more authors
Pavankumar Reddy Muddireddy Rohin Arora Sanchit Gandhi Sandeep Subramanian Soham Ghosh Srijan Mishra Abhinav Rastogi Adrien Sad\'e Alan Jeffares Albert Jiang Alexandre Cahill Alexandre Gavaudan Alexandre Sablayrolles Am\'elie H\'eliou Amos You Andrew Bai Angele Lenglemetz Anmol Agarwal Anton Eliseev Antonia Calvi Arjun Majumdar Avi Sooriyarachchi Baptiste Bout Baptiste Rozi\`ere Baudouin De Monicault Benjamin Tibi Charlotte Cronj\"ager Cl\'emence Lanfranchi Connor Chen Corentin Barreau Corentin Sautier Cyprien Courtot Darius Dabert Diego de las Casas Elizaveta Demyanenko Elliot Chane-Sane Enguerrand Paquin Etienne Goffinet Fabien Niel Faruk Ahmed Federico Baldassarre Gabrielle Berrada Ga\"etan Ecrepont Gauthier Guinet Genevieve Hayes Georgii Novikov Giada Pistilli Guillaume Kunsch Guillaume Martin Guillaume Raille Gunjan Dhanuka Gunshi Gupta Han Zhou Harshil Shah Hope McGovern Hugo Thimonier Indraneel Mukherjee Irene Zhang Jaeyoung Kim Jan Ludziejewski Jason Rute Joachim Studnia John Harvill Jonas Amar Jos\'ephine Delas Josselin Somerville Roberts Julien Tauran Karmesh Yadav Kartik Khandelwal Kilian Tep Kush Jain Laurence Aitchison Laurent Fainsin L\'eonard Blier Lingxiao Zhao Louis Martin Lucile Saulnier Luyu Gao Maarten Buyl Manan Sharma Margaret Jennings Marie Pellat Mark Prins Martin Alexandre Mathieu Poir\'ee Mathilde Guillaumin Matthieu Dinot Matthieu Futeral Maxime Darrin Maximilian Augustin Mert Unsal Mia Chiquier Minh-Quang Pham Nathan Grinsztajn Neha Gupta Olivier Bousquet Olivier Duchenne Patricia Wang Paul Jacob Paul Wambergue Paula Kurylowicz Philippe Pinel Philom\`ene Chagniot Pierre Stock Piotr Mi{\l}o\'s Prateek Gupta Pravesh Agrawal Quentin Torroba Ram Ramrakhya Rishi Shah Romain Sauvestre Roman Soletskyi Rosalie Millner Rupert Menneer Sagar Vaze Samuel Barry Samuel Humeau Sean Cha Shashwat Verma Siddhant Waghjale Siddharth Gandhi Simon Lepage Sumukh Aithal Szymon Antoniak Teven Le Scao Th\'eo Cachet Theo Simon Sorg Thibaut Lavril Thomas Chabal Thomas Foubert Thomas Robert Thomas Wang Tim Lawson Tom Bewley Tom Edwards Tyler Wang Umar Jamil Umberto Tomasini Valeriia Nemychnikova Van Phung Vedant Nanda Victor Jouault Vincent Maladi\`ere Virgile Richard Vladislav Bataev Wassim Bouaziz Wen-Ding Li William Havard William Marshall Xinghui Li Xingran Guo Xinyu Yang Yannic Neuhaus Yassine El Ouahidi Yassir Bendou Yihan Wang Yimu Pan Zaccharie Ramzi Zhenlin Xu
Authors on Pith no claims yet
classification 💻 cs.AI
keywords realtimevoxtralofflineaudiodelaymodelstreamingstreams
0
0 comments X
read the original abstract

We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime is trained end-to-end for streaming, with explicit alignment between audio and text streams. Our architecture builds on the Delayed Streams Modeling framework, introducing a new causal audio encoder and Ada RMS-Norm for improved delay conditioning. We scale pretraining to a large-scale dataset spanning 13 languages. At a delay of 480ms, Voxtral Realtime achieves performance on par with Whisper, the most widely deployed offline transcription system. We release the model weights under the Apache 2.0 license.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Tadabur: A Large-Scale Quran Audio Dataset

    cs.SD 2026-04 unverdicted novelty 7.0

    Tadabur is a large-scale Quran audio dataset with over 1400 hours from 600+ reciters to support speech research and benchmarks.