Placeholder Image

Subtitles section Play video

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high-quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • JULIAN SHUN: Good afternoon, everyone.

  • So let's get started.

  • So today, we're going to be talking

  • about races and parallelism.

  • And you'll be doing a lot of parallel programming

  • for the next homework assignment and project.

  • One thing I want to point out is that it's important

  • to meet with your MITPOSSE as soon as possible,

  • if you haven't done so already, since that's

  • going to be part of the evaluation for the Project 1

  • grade.

  • And if you have trouble reaching your MITPOSSE members,

  • please contact your TA and also make a post on Piazza

  • as soon as possible.

  • So as a reminder, let's look at the basics of Cilk.

  • So we have cilk_spawn and cilk_sync statements.

  • In Cilk, this was the code that we

  • saw in last lecture, which computes the nth Fibonacci

  • number.

  • So when we say cilk_spawn, it means

  • that the named child function, the function right

  • after the cilk_spawn keyword, can execute in parallel

  • with the parent caller.

  • So it says that fib of n minus 1 can

  • execute in parallel with the fib function that called it.

  • And then cilk_sync says that control cannot pass this point

  • until all of this spawned children have returned.

  • So this is going to wait for fib of n minus 1

  • to finish before it goes on and returns the sum of x and y.

  • And recall that the Cilk keywords grant permission

  • for parallel execution, but they don't actually

  • force parallel execution.

  • So this code here says that we can execute fib of n minus 1

  • in parallel with this parent caller,

  • but it doesn't say that we necessarily have

  • to execute them in parallel.

  • And it's up to the runtime system

  • to decide whether these different functions will

  • be executed in parallel.

  • We'll talk more about the runtime system today.

  • And also, we talked about this example,

  • where we wanted to do an in-place matrix transpose.

  • And this used the cilk_for keyword.

  • And this says that we can execute

  • the iterations of this cilk_for loop in parallel.

  • And again, this says that the runtime system

  • is allowed to schedule these iterations in parallel,

  • but doesn't necessarily say that they

  • have to execute in parallel.

  • And under the hood, cilk_for statements

  • are translated into nested cilk_spawn and cilk_sync calls.

  • So the compiler is going to divide the iteration

  • space in half, do a cilk_spawn on one of the two halves,

  • call the other half, and then this

  • is done recursively until we reach

  • a certain size for the number of iterations

  • in a loop, at which point it just

  • creates a single task for that.

  • So any questions on the Cilk constructs?

  • Yes?

  • AUDIENCE: Is Cilk smart enough to recognize issues

  • with reading and writing for matrix transpose?

  • JULIAN SHUN: So it's actually not

  • going to figure out whether the iterations are

  • independent for you.

  • The programmer actually has to reason about that.

  • But Cilk does have a nice tool, which we'll talk about,

  • that will tell you which places your code might possibly

  • be reading and writing the same memory location,

  • and that allows you to localize any possible race

  • bugs in your code.

  • So we'll actually talk about races.

  • But if you just compile this code,

  • Cilk isn't going to know whether the iterations are independent.

  • So determinacy races-- so race conditions

  • are the bane of concurrency.

  • So you don't want to have race conditions in your code.

  • And there are these two famous race bugs that cause disaster.

  • So there is this Therac-25 radiation therapy machine,

  • and there was a race condition in the software.

  • And this led to three people being killed

  • and many more being seriously injured.

  • The North American blackout of 2003

  • was also caused by a race bug in the software,

  • and this left 50 million people without power.

  • So these are very bad.

  • And they're notoriously difficult to discover

  • by conventional testing.

  • So race bugs aren't going to appear every time

  • you execute your program.

  • And in fact, the hardest ones to find, which cause these events,

  • are actually very rare events.

  • So most of the times when you run your program,

  • you're not going to see the race bug.

  • Only very rarely will you see it.

  • So this makes it very hard to find these race bugs.

  • And furthermore, when you see a race bug,

  • it doesn't necessarily always happen

  • in the same place in your code.

  • So that makes it even harder.

  • So what is a race?

  • So a determinacy race is one of the most basic forms of races.

  • And a determinacy race occurs when

  • two logically parallel instructions

  • access the same memory location, and at least one

  • of these instructions performs a write to that location.

  • So let's look at a simple example.

  • So in this code here, I'm first setting x equal to 0.

  • And then I have a cilk_for loop with two iterations,

  • and each of the two iterations are

  • incrementing this variable x.

  • And then at the end, I'm going to assert that x is equal to 2.

  • So there's actually a race in this program here.

  • So in order to understand where the race occurs,

  • let's look at the execution graph here.

  • So I'm going to label each of these statements with a letter.

  • The first statement, a, is just setting x equal to 0.

  • And then after that, we're actually

  • going to have two parallel paths, because we

  • have two iterations of this cilk_for loop, which

  • can execute in parallel.

  • And each of these paths are going to increment x by 1.

  • And then finally, we're going to assert that x is equal to 2

  • at the end.

  • And this sort of graph is known as a dependency graph.

  • It tells you what instructions have

  • to finish before you execute the next instruction.

  • So here it says that B and C must

  • wait for A to execute before they proceed,

  • but B and C can actually happen in parallel, because there

  • is no dependency among them.

  • And then D has to happen after B and C finish.

  • So to understand why there's a race bug here,

  • we actually need to take a closer look

  • at this dependency graph.

  • So let's take a closer look.

  • So when you run this code, x plus plus

  • is actually going to be translated into three steps.

  • So first, we're going to load the value

  • of x into some processor's register, r1.

  • And then we're going to increment r1,

  • and then we're going to set x equal to the result of r1.

  • And the same thing for r2.

  • We're going to load x into register r2, increment r2,

  • and then set x equal to r2.

  • So here, we have a race, because both of these stores,

  • x1 equal to r1 and x2 equal to r2,

  • are actually writing to the same memory location.

  • So let's look at one possible execution of this computation

  • graph.

  • And we're going to keep track of the values of x, r1 and r2.

  • So the first instruction we're going to execute

  • is x equal to 0.

  • So we just set x equal to 0, and everything's good so far.

  • And then next, we can actually pick one of two instructions

  • to execute, because both of these two instructions

  • have their predecessors satisfied already.

  • Their predecessors have already executed.

  • So let's say I pick r1 equal to x to execute.

  • And this is going to place the value 0 into register r1.

  • Now I'm going to increment r1, so this

  • changes the value in r1 to 1.

  • Then now, let's say I execute r2 equal to x.

  • So that's going to read x, which has a value of 0.

  • It's going to place the value of 0 into r2.