Google launches Gemma 4 12B, an open-licensed multimodal AI that runs on 16GB laptops
Google on Wednesday introduced Gemma 4 12B, a new open-licensed multimodal artificial intelligence model that the company says can run locally on laptops with 16GB of VRAM or unified memory while processing text, images and audio without separate encoders.
The release is notable for how Google says the model handles different kinds of input. In a June 3 blog post, Olivier Lacombe, director of product management at Google DeepMind, and Gus Martins, a product manager at Google DeepMind, described Gemma 4 12B as a “unified, encoder-free” model, meaning it does not use separate systems to process images or audio before handing them off to the core language model. Instead, Google says vision and audio inputs flow directly into the model backbone. In plain terms, that could reduce complexity for developers building on-device AI applications. Google also says this is its first mid-sized Gemma model with native audio input.
Google is positioning the model as a lighter-weight option that still delivers performance close to larger systems. The company said Gemma 4 12B is a mid-sized addition to the Gemma 4 family, branded as the “Gemma 4 12B Unified Transformer,” and claimed its benchmark results are nearing those of Google’s larger 26 billion-parameter mixture-of-experts, or MoE, model on standard tests. That performance comparison is Google’s characterization and was not independently verified in the announcement.
The model is being released under the Apache 2.0 license, an open-source license that generally allows broad commercial use, modification and redistribution. Google said both pre-trained and instruction-tuned checkpoints are available through Hugging Face and Kaggle. Developers can also run or test the model through tools including LM Studio, Ollama and Google’s AI Edge tools, according to the company. Alongside the model launch, Google announced an official Skills Repository on GitHub for building agents with Gemma models; that repository is also licensed under Apache 2.0.
Google framed the practical pitch around local use rather than cloud deployment. In the blog post, the company said Gemma 4 12B is “small enough to run locally with just 16GB of VRAM or unified memory” and described it as designed to bring multimodal, agent-style capabilities directly to laptops. For developers, that matters because many multimodal systems require larger hardware setups or separate model components for image and audio processing. Google’s approach, as described in the post, is meant to make a single model easier to deploy on consumer-grade machines.
Gemma is Google’s family of open models, first introduced in February 2024. The broader Gemma 4 family was announced on April 2, 2026. In Wednesday’s post, Google said Gemma 4 models have now crossed 150 million downloads. That appears to refer specifically to Gemma 4 models; in its April announcement, Google said the broader Gemma family across generations had been downloaded more than 400 million times since the first release.