Our eighth generation TPUs: two chips for the agentic era

xnx 427 points 211 comments April 22, 2026

https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

Discussion Highlights (20 comments)

TheMrZZ

> A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory. This seems impressive. I don't know much about the space, so maybe it's not actually that great, but from my POV it looks like a competitive advantage for Google.

NoiseBert69

That cooling system looks crazy. What an unbelievable density.

Keyframe

As others have been capturing news cycle eyes, seems to me Google has been going from strength to strength quietly in the background capturing consumer market share and without much (any?) infrastructure problems considering they're so vertically integrated in AI since day one? At one point they even seemed like a lost cause, but they're like a tide.. just growing all around.

amazingamazing

If ai ends up having a winner I struggle to see how it doesn’t end with Google winning because they own the entire stack, or Apple because they will have deployed the most potentially AI capable edge sites.

aliljet

The real problem is that scientists doing this sort of early work more often than not want to burn hardware under their desks. Renting infrastructure in Google cloud isn't the only way...

nickandbro

I am curious what workloads Citadel Securities is running on these TPUs? Are you telling me they need the latest TPUs for market insights?

pmb

At this point, when you are doing big AI you basically have to buy it from NVidia or rent it from Google. And Google can design their chips and engine and systems in a whole-datacenter context, centralizing some aspects that are impossible for chip vendors to centralize, so I suspect that when things get really big, Google's systems will always be more cost-efficient. (disclosure: I am long GOOG, for this and a few other reasons)

paulmist

At $15/GB of HBM4 the 331.8TB of HBM4 per pod is 5 million...

vibe42

The pics of the cooling system is pretty good sci-fi / cyberpunk / steampunk inspo. If the whole AI bubble spectularly collapes, at least we got a lot of cool pics of custom hardware!

nsteel

This link has more on the architecture: https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

fulafel

"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025). Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

cmptrnerd6

Which company is building the silicon for Google? Is it tsmc? What node size? I didn't see it with a quick search, sorry if it was in the post.

varispeed

I can't help but think we will be "laughing" at this in 10 years time like we laugh at steam engines or abacus.

iandanforth

Anyone know if these are already powering all of Gemini services, some of them, or none yet? It's hard to tell if this will result in improvements in speed, lower costs, etc, or if those will be invisible, or have already happened.

kamranjon

It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.

WarmWash

Whats interesting to note, as someone who uses Gemini, ChatGPT, and Claude, is that Gemini consistently uses drastically fewer tokens than the other two. It seems like gemini is where it is because it has a much smaller thinking budget. It's hard to reconcile this because Google likely has the most compute and at the lowest cost, so why aren't they gassing the hell out of inference compute like the other two? Maybe all the other services they provide are too heavy? Maybe they are trying to be more training heavy? I don't know, but it's interesting to see.

zshn25

It would be interesting to benchmark a short training / inference run on the latest of TPU vs. NVIDIA GPU per cost basis

jmyeet

In recent discussions about Tim Apple [sic] moving on there was a discussion about whether Apple flopped on AI, which is my opinion. Of course you had the false dichotomy of doing nothing or burning money faster than the US military like OpenAI does. IMHO that happy medium is Google. Not having to pay the NVidia tax will likely be a huge competitive advantage. And nobody builds data centers as cost-effectively as Google. It's kind of crazy to be talking ExaFLOPS and Tb/s here. From some quick Googling: - The first MegaFLOPS CPU was in 1964 - A Cray supercomputer hit GigaFLOPS in 1988 with workstations hitting it in the 1990s. Consumer CPUs I think hit this around 1999 with the Pentium 3 at 1GHz+; - It was the 2010s before we saw off-the-shelf TFLOPS; - It was only last year where a single chip hit PetaFLOPS. I see the IBM Roadrunner hit this in 2008 but that was ~13,000 CPUs so... Obviously this is near 10,000 TPUs to get to ~121 EFLOPS (FP4 admittedly) but that's still an astounding number. IT means each one is doing ~12 PFLOPS (FP4). I saw a claim that Claude Mythos cost ~$10B to train. I personally believe Google can (or soon will be able to) do this for an order of magnitude less at least. I would love to know the true cost/token of Claude, ChatGPT and Gemini. I think you'll find Google has a massive cost advantage here.

himata4113

I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models. They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size. I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept.

SecretDreams

They are missing a header to show the transition in discussion from TPU8t to 8i! Thanks for posting otherwise. Edit: actually, looks like the header got captured as a figure caption on accident.

Our eighth generation TPUs: two chips for the agentic era

Discussion Highlights (20 comments)

Related Discussions