News Grower

Independent coverage of AI, startups, and technology.

Ars Technica May 6, 2026 at 15:44 Big Tech Stable Warm

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

Signal weather

Stable

The story has moved beyond the first headline and now acts as a reliable context anchor.

By Ryan Whitwam Original source
Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments

Stay on the signal

Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.

We send a confirmation link first, then only meaningful digests.

Story map

Understand this topic fast

A quick entry into the story: why it matters now, who is involved, and where to go next for context.

Why it matters now

This story is still moving and pulling follow-up coverage.
There are already 6 connected articles in the same storyline to continue from here.
The story keeps orbiting around Ars Technica, Future Tokens, and Gemma, so the entity pages are the fastest way to build context.
Ars Technica already has 4 follow-up stories on the same theme.

Topic constellation

Open the live map for this story

See which entities, story threads, sources, and follow-up articles shape this story right now.

Click nodes to continue

Entity Cluster Article Hub Source

Story timeline

Continue with this story

A short sequence of events and follow-up stories to understand the arc quickly.

Jun 23, 2026 at 22:30 Ars Technica

White House drastically shortens deadline for dropping quantum-vulnerable crypto

Order warns of national security risks if post-quantum cryptography isn't adopted in time.

Jun 23, 2026 at 22:07 Ars Technica

US's climate.gov site, taken down by Trump, relaunched by nonprofit

Climate.us has now restored everything taken down by the government.

Jun 23, 2026 at 20:43 Ars Technica

Odd police video shows drone removing knife from motionless suspect

Promo video comes as more US police departments fly drones as first responders.

Jun 23, 2026 at 18:19 Ars Technica

A curious crossover: The Toyota C-HR review

Although it's on the smaller side, this electric vehicle is not very chill.

Jun 23, 2026 at 18:13 Hacker News

Fired by Google for creating the Google workspace CLI

Comments

May 6, 2026 at 15:44 Ars Technica

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

How reliable this looks

Signal and trust for Ars Technica

This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.

Trusted

Reliability

92

Freshness

100

Sources in storyline

2

Related articles

More stories that share tags, source, or category context.

More from Ars Technica

Fresh reporting and follow-up coverage from the same newsroom.

Open source page