Placeholder Image

Subtitles section Play video

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high-quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • JULIAN SHUN: Today, we're going to talk

  • about multicore programming.

  • And as I was just informed by Charles, it's 2018.

  • I had 2017 on the slide.

  • So first, congratulations to all of you.

  • You turned in the first project's data.

  • Here's a plot showing the tiers that different groups reached

  • for the beta.

  • And this is in sorted order.

  • And we set the beta cutoff to be tier 45.

  • The final cutoff is tier 48.

  • So the final cutoff we did set a little bit aggressively,

  • but keep in mind that you don't necessarily

  • have to get to the final cutoff in order

  • to get an A on this project.

  • So we're going to talk about multicore processing today.

  • That's going to be the topic of the next project

  • after you finish the first project.

  • So in a multicore processor, we have a whole bunch

  • of cores that are all placed on the same chip,

  • and they have access to shared memory.

  • They usually also have some sort of private cache, and then

  • a shared last level cache, so L3, in this case.

  • And then they all have access the same memory controller,

  • which goes out to main memory.

  • And then they also have access to I/O.

  • But for a very long time, chips only had a single core on them.

  • So why do we have multicore processors nowadays?

  • Why did semiconductor vendors start

  • producing chips that had multiple processor

  • cores on them?

  • So the answer is because of two things.

  • So first, there's Moore's Law, which

  • says that we get more transistors every year.

  • So the number of transistors that you can fit on a chip

  • doubles approximately every two years.

  • And secondly, there's the end of scaling of clock frequency.

  • So for a very long time, we could just

  • keep increasing the frequency of the single core on the chip.

  • But at around 2004 to 2005, that was no longer the case.

  • We couldn't scale the clock frequency anymore.

  • So here's a plot showing both the number of transistors

  • you could fit on the chip over time,

  • as well as the clock frequency of the processors over time.

  • And notice that the y-axis is in log scale here.

  • And the blue line is basically Moore's Law,

  • which says that the number of transistors

  • you can fit on a chip doubles approximately every two years.

  • And that's been growing pretty steadily.

  • So this plot goes up to 2010, but in fact, it's

  • been growing even up until the present.

  • And it will continue to grow for a couple

  • more years before Moore's Law ends.

  • However, if you look at the clock frequency line,

  • you see that it was growing quite

  • steadily until about the early 2000s, and then at that point,

  • it flattened out.

  • So at that point, we couldn't increase the clock frequencies

  • anymore, and the clock speed was bounded

  • at about four gigahertz.

  • So nowadays, if you go buy a processor,

  • it's usually still bounded by around 4 gigahertz.

  • It's usually a little bit less than 4 gigahertz,

  • because it doesn't really make sense to push it all the way.

  • But you might find some processors

  • that are around 4 gigahertz nowadays.

  • So what happened at around 2004 to 2005?

  • Does anyone know?

  • So Moore's Law basically says that we

  • can fit more transistors on a chip

  • because the transistors become smaller.

  • And when the transistors become smaller,

  • you can reduce the voltage that's

  • needed to operate the transistors.

  • And as a result, you can increase the clock frequency

  • while maintaining the same power density.

  • And that's what manufacturers did until about 2004 to 2005.

  • They just kept increasing the clock frequency

  • to take advantage of Moore's law.

  • But it turns out that once transistors become

  • small enough, and the voltage used

  • to operate them becomes small enough,

  • there's something called leakage current.

  • So there's current that leaks, and we're

  • unable to keep reducing the voltage while still having

  • reliable switching.

  • And if you can't reduce the voltage anymore,

  • then you can't increase the clock frequency

  • if you want to keep the same power density.

  • So here's a plot from Intel back in 2004

  • when they first started producing multicore processors.

  • And this is plotting the power density versus time.

  • And again, the y-axis is in log scale here.

  • So the green data points are actual data points,

  • and the orange ones are projected.

  • And they projected what the power density

  • would be if we kept increasing the clock

  • frequency at a trend of about 25% to 30% per year,

  • which is what happened up until around 2004.

  • And because we couldn't reduce the voltage anymore,

  • the power density will go up.

  • And you can see that eventually, it

  • reaches the power density of a nuclear reactor, which

  • is pretty hot.

  • And then it reaches the power density of a rocket nozzle,

  • and eventually you get to the power

  • density of the sun's surface.

  • So if you have a chip that has a power density

  • equal to the sun's surface--

  • well, you don't actually really have a chip anymore.

  • So basically if you get into this orange region,

  • you basically have a fire, and you can't really

  • do anything interesting, in terms of performance

  • engineering, at that point.

  • So to solve this problem, semiconductor vendors

  • didn't increased the clock frequency anymore,

  • but we still had Moore's Law giving us

  • more and more transistors every year.

  • So what they decided to do with these extra transistors

  • was to put them into multiple cores,

  • and then put multiple cores on the same chip.

  • So we can see that, starting at around 2004,

  • the number of cores per chip becomes more than one.

  • And each generation of Moore's Law

  • will potentially double the number of cores

  • that you can fit on a chip, because it's doubling

  • the number of transistors.

  • And we've seen this trend up until about today.

  • And again, it's going to continue for a couple

  • more years before Moore's Law ends.

  • So that's why we have chips with multiple cores today.

  • So today, we're going to look at multicore processing.

  • So I first want to introduce the abstract multicore

  • architecture.

  • So this is a very simplified version,

  • but I can fit it on this slide, and it's a good example

  • for illustration.

  • So here, we have a whole bunch of processors.

  • They each have a cache, so that's

  • indicated with the dollar sign.

  • And usually they have a private cache as well as

  • a shared cache, so a shared last level cache, like the L3 cache.

  • And then they're all connected to the network.

  • And then, through the network, they

  • can connect to the main memory.

  • They can all access the same shared memory.

  • And then usually there's a separate network for the I/O

  • as well, even though I've drawn them as a single network here,

  • so they can access the I/O interface.

  • And potentially, the network will also

  • connect to other multiprocessors on the same system.

  • And this abstract multicore architecture

  • is known as a chip multiprocessor, or CMP.

  • So that's the architecture that we'll be looking at today.

  • So here's an outline of today's lecture.

  • So first, I'm going to go over some hardware challenges

  • with shared memory multicore machines.

  • So we're going to look at the cache coherence protocol.

  • And then after looking at hardware,

  • we're going to look at some software solutions

  • to write parallel programs on these multicore machines

  • to take advantage of the extra cores.

  • And we're going to look at several concurrency

  • platforms listed here.

  • We're going to look at Pthreads.

  • This is basically a low-level API

  • for accessing, or for running your code in parallel.

  • And if you program