Google Discloses TPUv4 Details
Google’s TPUv4 excels at AI models employing embeddings owing to its sea of SparseCores that supplement its two main cores. Targeting inference, the TPUv4i has only a single larger core to reduce power.
Joseph Byrne
Flowers are a sign of spring, and Google’s TPUv4 disclosure is a sign that it’s replacing the chip with its successor. A recent paper, to be presented in June at the International Symposium on Computer Architecture (ICSA), sheds light on how the company developed the AI processor and the supercomputer based on it.
For the fourth generation, Google developed two 7 nm chips: the TPUv4i for inference and the TPUv4 for training. The big VLIW TensorCores these chips employ have more matrix units than those in the TPUv3 and the new AI accelerators add a large common memory. The main difference between the TPUv4i and the TPUv4 is that the latter integrates two TensorCores like the TPUv2 and TPUv3, whereas the TPUv4i implements only one to enable air cooling. In contrast to competing inference-focused accelerators that emphasize INT8 throughput, Google sees accuracy benefits from eschewing quantization and sticking with the same floating-point formats for inference as for training.
Google’s ICSA paper also discusses the TPU’s SparseCores. First included in the TPUv2, these engines have proven more useful as Google has used them to process recommendation and language models that employ embeddings (vectors representing items such as words in a text block or videos watched). They’re simpler than the main VLIW core, enabling a TPU to instantiate many of them in a sea of parallel cores. The company reports that SparseCores accelerate models that employ them by 5x–7x but use only 5% of a chip’s area and power.
Google hasn’t yet disclosed information about the TPUv5. The recent paper alludes to it and hints that it’s a 4 nm chip first deployed in 2023, three years after the TPUv4. By contrast, the TPUv4, TPUv3, and TPUv2 each followed its predecessor by only one year. The long life of TPUv4 demonstrates it was flexible enough to adapt to the company’s evolving workload and left little room to improve performance, efficiency, and scalability.
Free Newsletter
Get the latest analysis of new developments in semiconductor market and research analysis.
Subscribers can view the full article in the TechInsights Platform.
You must be a subscriber to access the Manufacturing Analysis reports & services.
If you are not a subscriber, you should be! Enter your email below to contact us about access.