
Interfaze, a young YC’s startup, has open-sourced a new speech recognition model. It is called diffusion-gemma-asr-small. The model transcribes audio through a diffusion decoder, not an autoregressive one. It is described as the first multilingual audio diffusion ASR model. One adapter handles six languages. The research team trained only about 42M parameters on top of a frozen 26B backbone. That is roughly 0.16% of the model’s weights. Here two terms matter up front. Autoregress
Will diffusion-gemma-asr-small appear on Hugging Face by July 7, 2026?
Resolves by Jul 7, 2026
An open-source speech-to-text model called diffusion-gemma-asr-small converts audio to text using a diffusion decoder rather than the more common autoregressive approach, meaning it refines all output tokens in parallel instead of generating them one at a time. The model handles six languages through a single adapter of about 42 million parameters attached to a much larger frozen backbone, making it efficient for multilingual transcription without loading separate models per language. Unlike autoregressive models that generate output length-dependent costs, this diffusion approach scales cost by denoising steps instead, making it well-suited for batch transcription where longer and shorter clips require similar processing time. While it performs competitively against other diffusion-based speech recognition systems, it trails the autoregressive Whisper model on benchmarks, a gap the research team attributes to training data rather than architectural differences.

Anthropic is redeploying Claude Fable 5, its most capable generally available model. On June 30, it announced that US export controls had lifted. The controls had covered Claude Fable 5 and Claude Mythos 5. Fable 5 returned to users globally on Wednesday, July 1. Mythos 5 access is restored to a set of US organizations. The models were pulled on June 12. A US government directive restricted them to non-foreign-nationals. Anthropic could not verify nationality in real time. So it suspended bo

Google Research introduced TabFM, a foundation model built for tabular data. TabFM performs classification and regression without dataset-specific training. Every prediction comes from a single forward pass. The model reframes tabular prediction as an in-context learning problem. It is available now on Hugging Face and GitHub. TL;DR TabFM predicts on unseen tables with no training, tuning, or feature engineering. It reads the full dataset as one prompt, then predicts via in-context le
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven