Google Splits TPU Line: New TPU 8t for Training, TPU 8i for Low-Latency Inference
Google on Tuesday unveiled two eighth-generation Tensor Processing Units, or TPUs, splitting its custom AI hardware into separate chips for training and inference as it pushes deeper into the data-center AI infrastructure race. In a blog post published during Google Cloud Next, Amin Vahdat, Google’s senior vice president and chief technologist for AI and infrastructure, said the new TPU 8t and TPU 8i were designed with Google DeepMind and that “Both chips will be generally available later this year.”
The split is the central shift in Google’s latest TPU generation. Rather than centering one general-purpose accelerator, Google is explicitly tailoring one chip for training large AI models and another for serving those models in low-latency inference and reasoning tasks. Google said the approach is built for the “agentic” era of AI, where systems handle multistep tasks and require different balances of memory, speed and interconnect depending on the workload.
TPU 8t is Google’s larger training system. According to Google, a single TPU 8t superpod can scale to 9,600 chips, include 2 petabytes of shared high-bandwidth memory and deliver 121 exaflops of compute. Google said that amounts to nearly three times the compute performance per pod of the previous generation. The company is positioning the chip as part of its broader AI Hypercomputer stack, which combines compute, storage, networking, software and orchestration for AI workloads.
TPU 8i, by contrast, is aimed at inference and low-latency reasoning, the work of running trained models in production and generating responses quickly. Google said TPU 8i combines 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM, or fast memory built directly into the processor, which it said is three times more SRAM than the prior generation. Google also said TPU 8i doubles inter-chip interconnect bandwidth to 19.2 Tb/s and delivers 80% better performance-per-dollar than the previous generation.
Google said both TPU 8t and TPU 8i use its Axion Arm-based CPU host and deliver up to twice the performance-per-watt of Ironwood, the company’s seventh-generation TPU announced in April 2025. Google also said both use fourth-generation liquid cooling, underscoring the company’s argument that power efficiency is becoming a key constraint in AI data centers. In the announcement, Vahdat described the products as “The culmination of a decade of development, TPU 8t and TPU 8i are custom-engineered to power the next generation of supercomputing with efficiency and scale.”
The announcement comes as cloud providers and chipmakers compete to build the systems behind large AI models, with Nvidia’s Blackwell family among the most prominent rivals in training and inference infrastructure. Ironwood, Google’s prior TPU generation, was presented mainly as an inference-focused design. This year’s move to separate training and inference into two specialized chips marks a clearer workload-based strategy.
All of the performance, efficiency, scaling and cost figures cited for TPU 8t and TPU 8i are Google’s claims from its announcement. No independent benchmark validation was available in the sourced material at the time of publication.