Ars Technica May 6, 2026 at 15:44 Big Tech Stable Warm

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

Signal weather

Stable

The story has moved beyond the first headline and now acts as a reliable context anchor.

By Ryan Whitwam Original source

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments

Read the full article

Stay on the signal

Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.

Story map

Understand this topic fast

A quick entry into the story: why it matters now, who is involved, and where to go next for context.

Why it matters now

This story is still moving and pulling follow-up coverage.

There are already 6 connected articles in the same storyline to continue from here.

The story keeps orbiting around Ars Technica, Future Tokens, and Gemma, so the entity pages are the fastest way to build context.

Ars Technica already has 4 follow-up stories on the same theme.

Topic constellation

Open the live map for this story

See which entities, story threads, sources, and follow-up articles shape this story right now.

Click nodes to continue

Entity Cluster Article Hub Source

Entity pages

Ars Technica Future Tokens Gemma Google Models Predicting

Story threads

Ars Technica

Последние материалы и связанный контекст по теме Ars Technica.

Ars Technica

Latest coverage and related links about Ars Technica.

Gemma

Latest coverage and related links about Gemma.

Gemma

Последние материалы и связанный контекст по теме Gemma.

Story timeline

Continue with this story

A short sequence of events and follow-up stories to understand the arc quickly.

Jun 23, 2026 at 22:30 Ars Technica

White House drastically shortens deadline for dropping quantum-vulnerable crypto

Order warns of national security risks if post-quantum cryptography isn't adopted in time.

Jun 23, 2026 at 22:07 Ars Technica

US's climate.gov site, taken down by Trump, relaunched by nonprofit

Climate.us has now restored everything taken down by the government.

Jun 23, 2026 at 20:43 Ars Technica

Odd police video shows drone removing knife from motionless suspect

Promo video comes as more US police departments fly drones as first responders.

Jun 23, 2026 at 18:19 Ars Technica

A curious crossover: The Toyota C-HR review

Although it's on the smaller side, this electric vehicle is not very chill.

Jun 23, 2026 at 18:13 Hacker News

Fired by Google for creating the Google workspace CLI

Comments

May 6, 2026 at 15:44 Ars Technica

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

How reliable this looks

Signal and trust for Ars Technica

This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.

Trusted

Reliability

Freshness

100

Sources in storyline

More stories that share tags, source, or category context.

Hacker News Jun 24, 2026 at 02:21 Developer Tools

Rising Hot

Qwen-AgentWorld: Language World Models for General Agents

Comments

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Comments General General Agents General Agents Comments

Read article Follow story

arxiv.org

White House drastically shortens deadline for dropping quantum-vulnerable crypto

Ars Technica Jun 23, 2026 at 22:30 Big Tech

Rising Hot

White House drastically shortens deadline for dropping quantum-vulnerable crypto

Order warns of national security risks if post-quantum cryptography isn't adopted in time.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Cryptography Drastically Drastically Shortens

Read article Follow story

arstechnica.com

US's climate.gov site, taken down by Trump, relaunched by nonprofit

Ars Technica Jun 23, 2026 at 22:07 Big Tech

Rising Hot

US's climate.gov site, taken down by Trump, relaunched by nonprofit

Climate.us has now restored everything taken down by the government.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Climate Climate.us Down

Read article Follow story

arstechnica.com

Odd police video shows drone removing knife from motionless suspect

Ars Technica Jun 23, 2026 at 20:43 Big Tech

Rising Hot

Odd police video shows drone removing knife from motionless suspect

Promo video comes as more US police departments fly drones as first responders.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Departments Drone Removing Motionless

Read article Follow story

arstechnica.com

More from Ars Technica

Fresh reporting and follow-up coverage from the same newsroom.

Open source page

Ars Technica Jun 23, 2026 at 22:30 Big Tech

Rising Hot

White House drastically shortens deadline for dropping quantum-vulnerable crypto

Order warns of national security risks if post-quantum cryptography isn't adopted in time.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Cryptography Drastically Drastically Shortens

Read article Follow story

arstechnica.com

Ars Technica Jun 23, 2026 at 22:07 Big Tech

Rising Hot

US's climate.gov site, taken down by Trump, relaunched by nonprofit

Climate.us has now restored everything taken down by the government.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Climate Climate.us Down

Read article Follow story

arstechnica.com

Ars Technica Jun 23, 2026 at 20:43 Big Tech

Rising Hot

Odd police video shows drone removing knife from motionless suspect

Promo video comes as more US police departments fly drones as first responders.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Departments Drone Removing Motionless

Read article Follow story

arstechnica.com

Oracle’s 21,000 layoffs help drive its debt-fueled AI investments

Ars Technica Jun 23, 2026 at 20:17 Big Tech

Rising Hot

Oracle’s 21,000 layoffs help drive its debt-fueled AI investments

Oracle is spending billions on data center infrastructure to support AI.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

AI AI. Ars Technica Data Center Debt Fueled

Read article Follow story

arstechnica.com

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Understand this topic fast

Why it matters now

Open the live map for this story

Entity pages

Story threads

Continue with this story

Signal and trust for Ars Technica

Related articles

More from Ars Technica