Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Up to 3x the speed with no loss of quality—is it too good to be true?
Signal weather
Rising
Momentum is building quickly, so this card is a good early entry point into the topic.
Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments
Stay on the signal
Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.
Story map
Understand this topic fast
A quick entry into the story: why it matters now, who is involved, and where to go next for context.
Why it matters now
Topic constellation
Open the live map for this story
See which entities, story threads, sources, and follow-up articles shape this story right now.
Click nodes to continue
Entity pages
Story timeline
Continue with this story
A short sequence of events and follow-up stories to understand the arc quickly.
How reliable this looks
Signal and trust for Ars Technica
This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.
Reliability
92
Freshness
100
Sources in storyline
2
Related articles
More stories that share tags, source, or category context.
Report: SpaceX IPO gives Musk unchecked power and forbids investor lawsuits
Anyone who buys into SpaceX IPO must waive right to sue the firm, report says.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
The animated version of the iconic "Hello, world" image reveals striking new details
What's going on with those satellites, anyway?
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
FDA vaccine studies censored by Trump admin after finding benefits of shots
FDA has suppressed studies on COVID-19 vaccines and Shingrix, a shingles vaccine.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Google updates AI search to include quotes from Reddit and other sources
While citing web forums and discussion boards can help users find answers to more niche queries, this design choice could also prove chaotic.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
More from Ars Technica
Fresh reporting and follow-up coverage from the same newsroom.
Report: SpaceX IPO gives Musk unchecked power and forbids investor lawsuits
Anyone who buys into SpaceX IPO must waive right to sue the firm, report says.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Google DeepMind partners with EVE Online for AI model testing
Move comes as CCP Games spends $120M to go independent, rebrands as Fenris Creations.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
The animated version of the iconic "Hello, world" image reveals striking new details
What's going on with those satellites, anyway?
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
FDA vaccine studies censored by Trump admin after finding benefits of shots
FDA has suppressed studies on COVID-19 vaccines and Shingrix, a shingles vaccine.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.