Placeholder Image

Subtitles section Play video

  • Over the last year, everyone has been talking about:

  • Generative AI.

  • Generative AI.

  • Generative AI.

  • Generative AI.

  • I'm like, "Wait, why am I doing this? I just wait for the AI to do it."

  • Driving the boom are AI chips.

  • Some no bigger than the size of your palm, and the demand for them has skyrocketed.

  • We originally thought the total market for data center, AI accelerators would be about 150 billion, and now we think it's gonna be over 400 billion.

  • As AI gains popularity, some of the world's tech titans are racing to design chips that run better and faster.

  • Here's how they work and why tech companies are betting they're the future.

  • This is "The Tech Behind AI Chips."

  • This is Amazon's chip lab in Austin, Texas, where the company designs AI chips to use in AWS's servers.

  • Right out of manufacturing, we get something that is called the wafer.

  • Ron Diamant is the chief architect of Inferentia and Trainium, the company's custom AI chips.

  • These are the compute elements or the components that actually perform the computation.

  • Each of these rectangles, called dice, is a chip.

  • Each die contains tens of billions of microscopic semiconductors called transistors that communicate inputs and outputs.

  • Think about one millionth of a centimeter, that's roughly the size of each one of these transistors.

  • All chips use semiconductors like this.

  • What makes AI chips different from CPUs, the kind of chip that powers your computer or phone, is how they're packaged.

  • Say, for example, you want to generate a new image of a cat.

  • CPUs have a smaller number of powerful cores.

  • The units that make up the chip that are good at doing a lot of different things, these cores process information sequentially.

  • So one calculation after another.

  • So to create a brand new image of a cat, it would only produce a couple pixels at a time.

  • But an AI chip has more cores that run in parallel, so it can process hundreds or even thousands of those cat pixels all at once.

  • These cores are smaller and typically do less than CPU cores, but are specially designed for running AI calculations.

  • But those chips can't operate on their own.

  • That compute die then gets integrated into a package, and that's what people typically think about when they think about the chip.

  • Amazon makes two different AI chips, named for its two essential functions, training and inference.

  • Training is where an AI model is set millions of examples of something, images of cats, for instance, to teach it what a cat is and what it looks like.

  • Inference is when it uses that training to actually generate an original image of a cat.

  • Training is the most difficult part of this process.

  • We typically train not on one chip, but rather on tens of thousands of chips.

  • In contrast, inference is typically done on 1 to 16 chips.

  • Processing all of that information demands a lot of energy, which generates heat.

  • And we're able to use this device here in order to force a certain temperature to the chip,

  • and that's how we're able to test that the chip is reliable at very low temperatures and very high temperatures.

  • To help keep chips cool, they're attached to heat sinks, pieces of metal with vents that help dissipate heat.

  • Once they're packaged, the chips are integrated into servers for Amazon's AWS cloud.

  • So the training cards will be mounted on this baseboard, eight of them in total, and they are interconnected between them at a very high bandwidth and low latency.

  • So this allows the different training devices inside the server to work together on the same training job.

  • So if you are interacting with an AI chatbot, your text, your question will hit the CPUs, and the CPUs will move the data into the Inferentia2 devices, which will collectively perform a gigantic computation.

  • Basically performing the AI model, will respond to the CPU with the result, and the CPU will send the result back to you.

  • Amazon's chips are just one type competing in this emerging market, which is currently dominated by the biggest chip designer, Nvidia.

  • Nvidia is still a chip provider to all different types of customers who have to run different workloads.

  • And then the next category of competitor that you have is the major cloud providers.

  • Microsoft, Amazon AWS, and Google are all designing their own chips because they can optimize their computing workloads for the software that runs on their cloud to get a performance edge,

  • and they don't have to give Nvidia its very juicy profit margin on the sale of every chip.

  • -But right now, generative AI is still a young technology.

  • It's mostly used in consumer-facing products like chatbots and image generators,

  • but experts say that hype cycles around technology can pay off in the end.

  • While there might be something like a dot-com bubble for the current AI hype cycle, at the end of the dot-com bubble was still the internet.

  • And I think we're in a similar situation with generative AI.

  • The technology's rapid advance means that chips and the software to use them are going to have to keep up.

  • Amazon says it uses a mixture of its own chips and Nvidia's chips to give customers multiple options.

  • Microsoft says it's following a similar model.

  • For those cloud providers, the question is, how much of their computing workloads for AI is gonna be offered through Nvidia versus their own custom AI chips?

  • And that's the battle that's playing out in corporate boardrooms all over the world.

  • Amazon released a new version of Trainium in November.

  • Diamant says he doesn't see the AI boom slowing down anytime soon.

  • We've been investing in machine learning and artificial intelligence for almost two decades now,

  • and we're just seeing a step-up in pace of innovation and capabilities that these models are enabling us.

  • So our investment in AI chips is here to stay with a significant step-up in capabilities from generation to generation.

Over the last year, everyone has been talking about:

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it