Subtitles section Play video
The following content is provided under a Creative
Commons license.
Your support will help MIT OpenCourseWare
continue to offer high-quality educational resources for free.
To make a donation or to view additional materials
from hundreds of MIT courses, visit MIT OpenCourseWare
at ocw.mit.edu.
JULIAN SHUN: Today, we're going to talk
about multicore programming.
And as I was just informed by Charles, it's 2018.
I had 2017 on the slide.
So first, congratulations to all of you.
You turned in the first project's data.
Here's a plot showing the tiers that different groups reached
for the beta.
And this is in sorted order.
And we set the beta cutoff to be tier 45.
The final cutoff is tier 48.
So the final cutoff we did set a little bit aggressively,
but keep in mind that you don't necessarily
have to get to the final cutoff in order
to get an A on this project.
So we're going to talk about multicore processing today.
That's going to be the topic of the next project
after you finish the first project.
So in a multicore processor, we have a whole bunch
of cores that are all placed on the same chip,
and they have access to shared memory.
They usually also have some sort of private cache, and then
a shared last level cache, so L3, in this case.
And then they all have access the same memory controller,
which goes out to main memory.
And then they also have access to I/O.
But for a very long time, chips only had a single core on them.
So why do we have multicore processors nowadays?
Why did semiconductor vendors start
producing chips that had multiple processor
cores on them?
So the answer is because of two things.
So first, there's Moore's Law, which
says that we get more transistors every year.
So the number of transistors that you can fit on a chip
doubles approximately every two years.
And secondly, there's the end of scaling of clock frequency.
So for a very long time, we could just
keep increasing the frequency of the single core on the chip.
But at around 2004 to 2005, that was no longer the case.
We couldn't scale the clock frequency anymore.
So here's a plot showing both the number of transistors
you could fit on the chip over time,
as well as the clock frequency of the processors over time.
And notice that the y-axis is in log scale here.
And the blue line is basically Moore's Law,
which says that the number of transistors
you can fit on a chip doubles approximately every two years.
And that's been growing pretty steadily.
So this plot goes up to 2010, but in fact, it's
been growing even up until the present.
And it will continue to grow for a couple
more years before Moore's Law ends.
However, if you look at the clock frequency line,
you see that it was growing quite
steadily until about the early 2000s, and then at that point,
it flattened out.
So at that point, we couldn't increase the clock frequencies
anymore, and the clock speed was bounded
at about four gigahertz.
So nowadays, if you go buy a processor,
it's usually still bounded by around 4 gigahertz.
It's usually a little bit less than 4 gigahertz,
because it doesn't really make sense to push it all the way.
But you might find some processors
that are around 4 gigahertz nowadays.
So what happened at around 2004 to 2005?
Does anyone know?
So Moore's Law basically says that we
can fit more transistors on a chip
because the transistors become smaller.
And when the transistors become smaller,
you can reduce the voltage that's
needed to operate the transistors.
And as a result, you can increase the clock frequency
while maintaining the same power density.
And that's what manufacturers did until about 2004 to 2005.
They just kept increasing the clock frequency
to take advantage of Moore's law.
But it turns out that once transistors become
small enough, and the voltage used
to operate them becomes small enough,
there's something called leakage current.
So there's current that leaks, and we're
unable to keep reducing the voltage while still having
reliable switching.
And if you can't reduce the voltage anymore,
then you can't increase the clock frequency
if you want to keep the same power density.
So here's a plot from Intel back in 2004
when they first started producing multicore processors.
And this is plotting the power density versus time.
And again, the y-axis is in log scale here.
So the green data points are actual data points,
and the orange ones are projected.
And they projected what the power density
would be if we kept increasing the clock
frequency at a trend of about 25% to 30% per year,
which is what happened up until around 2004.
And because we couldn't reduce the voltage anymore,
the power density will go up.
And you can see that eventually, it
reaches the power density of a nuclear reactor, which
is pretty hot.
And then it reaches the power density of a rocket nozzle,
and eventually you get to the power
density of the sun's surface.
So if you have a chip that has a power density
equal to the sun's surface--
well, you don't actually really have a chip anymore.
So basically if you get into this orange region,
you basically have a fire, and you can't really
do anything interesting, in terms of performance
engineering, at that point.
So to solve this problem, semiconductor vendors
didn't increased the clock frequency anymore,
but we still had Moore's Law giving us
more and more transistors every year.
So what they decided to do with these extra transistors
was to put them into multiple cores,
and then put multiple cores on the same chip.
So we can see that, starting at around 2004,
the number of cores per chip becomes more than one.
And each generation of Moore's Law
will potentially double the number of cores
that you can fit on a chip, because it's doubling
the number of transistors.
And we've seen this trend up until about today.
And again, it's going to continue for a couple
more years before Moore's Law ends.
So that's why we have chips with multiple cores today.
So today, we're going to look at multicore processing.
So I first want to introduce the abstract multicore
architecture.
So this is a very simplified version,
but I can fit it on this slide, and it's a good example
for illustration.
So here, we have a whole bunch of processors.
They each have a cache, so that's
indicated with the dollar sign.
And usually they have a private cache as well as
a shared cache, so a shared last level cache, like the L3 cache.
And then they're all connected to the network.
And then, through the network, they
can connect to the main memory.
They can all access the same shared memory.
And then usually there's a separate network for the I/O
as well, even though I've drawn them as a single network here,
so they can access the I/O interface.
And potentially, the network will also
connect to other multiprocessors on the same system.
And this abstract multicore architecture
is known as a chip multiprocessor, or CMP.
So that's the architecture that we'll be looking at today.
So here's an outline of today's lecture.
So first, I'm going to go over some hardware challenges
with shared memory multicore machines.
So we're going to look at the cache coherence protocol.
And then after looking at hardware,
we're going to look at some software solutions
to write parallel programs on these multicore machines
to take advantage of the extra cores.
And we're going to look at several concurrency
platforms listed here.
We're going to look at Pthreads.
This is basically a low-level API
for accessing, or for running your code in parallel.
And if you program