The eighth-generation TPU: An architecture deep dive

meetpateltech 67 points 10 comments April 22, 2026

Discussion Highlights (5 comments)

zshn25

Splitting TPUs into dedicated training vs inference chips feels like an admission that the bottleneck has shifted from FLOPs to memory bandwidth + latency. Are future gains to come more from memory/system design than raw compute scaling? What’s that saying about Scaling laws?

ricardo81

dupe https://news.ycombinator.com/item?id=47862497

ttul

No matter how smart your large language model is, if you can’t find the energy to power it, it won’t run. I could imagine Google winning merely because their chips are more efficient. Of course, the other labs are capable of making chips, but Google has been doing it for years.

speedping

2.764 petabytes of HBM per 8i? So that's where all the RAM went.

juancn

Super interesting but it's so damn hard to find any detail. I would love to see an instruction set reference for one of these, all you have is hardware architectural diagrams or high level APIs.

The eighth-generation TPU: An architecture deep dive

Discussion Highlights (5 comments)

Related Discussions