DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
Advertisement

TL;DR
- DiffusionGemma writes a whole chunk of text in one go and then keeps polishing it rather than building it word by word.
- Google says it can be up to 4x faster, hitting 1,000+ tokens per second on NVIDIA H100 and around 700 on an RTX 5090, thanks to parallel processing.
- Output quality is still inferior to Gemma 4, so it’s more of an experimental tool than a finished product.
Google has released DiffusionGemma, an experimental AI model that takes a very different approach to how most chatbots generate text today. Instead of writing one word after another in a strict sequence, it generates a whole block of text at once and then keeps refining it until it becomes readable. The idea is to push for speed and hardware efficiency, even if it means giving up some polish in the final output.
Source: www.androidauthority.com
Advertisement


