Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x
TurboQuant makes AI models more efficient but doesn't reduce output quality like other methods.
Even if you don't know much about the inner workings of generative AI models, you probably know they need a lot of memory. Hence, it is currently almost impossible to buy a measly stick of RAM without getting fleeced. Google Research recently revealed TurboQuant, a compression algorithm that reduces the memory footprint of large language models (LLMs) while also boosting speed and maintaining accuracy. TurboQuant is aimed at reducing the size of the key-value cache, which Google likens to a "digital cheat sheet" that stores important information so it doesn't have to be recomputed. This cheat sheet is necessary because, as we say all the time, LLMs don't actually know anything; they can do a good impression of knowing things through the use of vectors, which map the semantic meaning of tokenized text. When two vectors are similar, that means they have conceptual similarity. High-dimensional vectors, which can have hundreds or thousands of embeddings, may describe complex information like the pixels in an image or a large data set. They also occupy a lot of memory and inflate the size of the key-value cache, bottlenecking performance. To make models smaller and more efficient, developers employ quantization techniques to run them at lower precision. The drawback is that the outputs get worse—the quality of token estimation goes down. With TurboQuant, Google's early results show an 8x performance increase and 6x reduction in memory usage in some tests without a loss of quality.Read full article Comments
Related tags
Companies and people
Story threads
Continue with this story
Follow the same topic through connected articles, entity pages, and active story threads.
Here is NASA's plan for nuking Gateway and sending it to Mars
Only one US-built nuclear reactor has ever flown in space, and that was more than 60 years ago.
Reddit will require "fishy" accounts to verify they are run by a human
AI-generated content is still acceptable for now.
We got an audience with the "Lunar Viceroy" to talk how NASA will build a Moon base
"It has been clear that we all need to be focused on one thing, not 10 things."
Meta, YouTube must pay $3M to woman who got hooked on apps as a child
Meta emerges as the biggest loser as second child safety trial verdict hits.
Nintendo is raising prices of Switch 2 game cartridges starting in May
The first physical game affected will cost $10 more than a digital copy.
Supreme Court rejects Sony's attempt to kick music pirates off the Internet
Sony's 1984 Betamax win helps Cox beat Sony in important online piracy case.
Entity pages
Ad slot
Article monetization slot
Reserved for contextual monetization inside article pages.
Related articles
More stories that share tags, source, or category context.
Here is NASA's plan for nuking Gateway and sending it to Mars
Only one US-built nuclear reactor has ever flown in space, and that was more than 60 years ago.
Reddit will require "fishy" accounts to verify they are run by a human
AI-generated content is still acceptable for now.
We got an audience with the "Lunar Viceroy" to talk how NASA will build a Moon base
"It has been clear that we all need to be focused on one thing, not 10 things."
Meta, YouTube must pay $3M to woman who got hooked on apps as a child
Meta emerges as the biggest loser as second child safety trial verdict hits.
More from Ars Technica
Fresh reporting and follow-up coverage from the same newsroom.
Here is NASA's plan for nuking Gateway and sending it to Mars
Only one US-built nuclear reactor has ever flown in space, and that was more than 60 years ago.
Reddit will require "fishy" accounts to verify they are run by a human
AI-generated content is still acceptable for now.
We got an audience with the "Lunar Viceroy" to talk how NASA will build a Moon base
"It has been clear that we all need to be focused on one thing, not 10 things."
Meta, YouTube must pay $3M to woman who got hooked on apps as a child
Meta emerges as the biggest loser as second child safety trial verdict hits.