CPUs, GPUs, and Now TPUs

Robert Keith Rippetoe
5 min readNov 28, 2023

--

A Processor Meant for Machine Learning

In recent years there has been a massive improvement in GPUs, or Graphical Processing Units, that allow them to handle everything from video editing to games to machine learning models with increasingly blistering speeds. With how much GPUs have been popularized, as has especially been the case with Nvidia’s RTX series, it is perhaps easy to assume they are the uncontested apex of computational performance. Not true! More recently a new even more specialized tool has been heralded to handle with increasing efficiency the machine learning workloads of the future: the TPU, or Tensor Processing Unit.

CPUs (Central Processing Unit)

Found in every consumer computer, CPUs are mandatory to run the BIOS (Basic Input Output System) that acts as an intermediary between your OS and hardware. This is fine for running a normal computer’s essential processes, but workloads aimed at graphics or Machine Learning(ML) computations will quickly outstrip a CPU because of its need to handle every possible operation, and thus its lack of specialization.

GPUs (Graphics Processing Unit)

Increasingly found in gaming computers, GPUs optimize workloads by not only including a much larger integrated VRAM(Virtual Ram) but also optimize by handling a much narrower variety of tasks. By having VRAM integrated into the GPU, operations can be optimized in a similar manner to a very large CPU cache. In many cases multiple GPUs are used in a larger case for use by startups or small companies with demanding graphics or ML applications. A popular setup is to take many powerful GPUs and run them together in a single workstation in order to handle heavy ML workloads without having to rely on external cloud providers. The drawbacks to this setup is the immense amount of power needed to run such a setup and the cost of multiple powerful GPUs, which depending on how much power you need can quickly become more expensive than most developers can afford. Now, an even more specialized form of processor can be used to address some of these issues.

TPUs (Tensor Processing Unit)

TPUs are application-specific integrated circuits (ASIC) that were originally invented by Google in 2016 to handle Machine Learning tasks specifically. Tensors themselves describe multilinear relationships between sets of algebraic objects related to a vector space. While this sounds very complicated, you can imagine a tensor simply as a mathematical container describing values in different dimension spaces. A vector is a tensor that describes a list of values in one dimension, a matrix is a tensor that defines values in two dimensions etc. Thus the “Tensor” in TPU emphasizes their ability to handle the intensive matrices math common in many ML applications.

“The primary task for TPUs is matrix processing, which is a combination of multiply and accumulate operations. TPUs contain thousands of multiply-accumulators that are directly connected to each other to form a large physical matrix. This is called a systolic array architecture. Cloud TPU v3, contains two systolic arrays of 128 x 128 ALUs, on a single processor.

The TPU host streams data into an infeed queue. The TPU loads data from the infeed queue and stores them in HBM memory. When the computation is completed, the TPU loads the results into the outfeed queue. The TPU host then reads the results from the outfeed queue and stores them in the host’s memory.” [https://cloud.google.com/tpu/docs/intro-to-tpu]

Google has generated multiple iterations of TPU’s since 2016. Currently on their 5th generation architecture, Google uses Deep Reinforcement Learning to generate designs of new chip layouts much as Nvidia has done themselves. While being an insane demonstration of the exponential nature of computational power in recent years, this also shows how easy it is to imagine the many possible uses of these specialized chips in the future if they can already handle their own self-improvement!

If you are tempted to go pick up a TPU and add it to your own rig, much like a GPU, you may be disappointed. Currently there are only two ways to access and work with TPUs: through the API enabled in google cloud, or through the coral USB accelerator. Google Cloud is a great option for developers that are already familiar with deploying to cloud environments or are eager to abstract away hardware requirements (and compare the prices between AWS and GCP). The coral USB accelerator on the other hand can handle 4 trillion operations per second (TOPS), which sounds awe inspiring until you compare it to the 191 TFLOPs an RTX 4090 GPU is capable of. The advantage of Coral then is that it only requires only 2W of power compared to the monstrous 450W a 4090 requires, allowing you to still upgrade a workstation to handle much higher ML workloads without worrying about your power supply. In the future we may see much more powerful TPUs to compete with powerful GPUs in sheer computational power, but for now these are the best two options.

TLDR: TPUs are the best architecture currently available to handle some Machine Learning workloads, specifically Convolutional Neural Networks or Deep Reinforcement Learning. Until new offerings are made, TPU’s can only handle relatively light local computations or must be accessed directly through Google Cloud. If you are a local developer that can’t afford a larger GPU and power supply, or a company looking for the most efficient cloud servers for your ML workloads, TPUs might just be the key to increasing the bang for your buck from hardware. If this sounds like something you may benefit from, I highly recommend reading the publication made by Google itself explaining the benefits and downfalls of the TPU architecture.

--

--

Robert Keith Rippetoe
Robert Keith Rippetoe

Written by Robert Keith Rippetoe

Software Engineer with an emphasis on cloud infrastructure, devops, and site reliability.

Responses (1)