Placeholder Image

Subtitles section Play video

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high-quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • JULIAN SHUN: Good afternoon, everyone.

  • So let's get started.

  • So today, we're going to be talking

  • about races and parallelism.

  • And you'll be doing a lot of parallel programming

  • for the next homework assignment and project.

  • One thing I want to point out is that it's important

  • to meet with your MITPOSSE as soon as possible,

  • if you haven't done so already, since that's

  • going to be part of the evaluation for the Project 1

  • grade.

  • And if you have trouble reaching your MITPOSSE members,

  • please contact your TA and also make a post on Piazza

  • as soon as possible.

  • So as a reminder, let's look at the basics of Cilk.

  • So we have cilk_spawn and cilk_sync statements.

  • In Cilk, this was the code that we

  • saw in last lecture, which computes the nth Fibonacci

  • number.

  • So when we say cilk_spawn, it means

  • that the named child function, the function right

  • after the cilk_spawn keyword, can execute in parallel

  • with the parent caller.

  • So it says that fib of n minus 1 can

  • execute in parallel with the fib function that called it.

  • And then cilk_sync says that control cannot pass this point

  • until all of this spawned children have returned.

  • So this is going to wait for fib of n minus 1

  • to finish before it goes on and returns the sum of x and y.

  • And recall that the Cilk keywords grant permission

  • for parallel execution, but they don't actually

  • force parallel execution.

  • So this code here says that we can execute fib of n minus 1

  • in parallel with this parent caller,

  • but it doesn't say that we necessarily have

  • to execute them in parallel.

  • And it's up to the runtime system

  • to decide whether these different functions will

  • be executed in parallel.

  • We'll talk more about the runtime system today.

  • And also, we talked about this example,

  • where we wanted to do an in-place matrix transpose.

  • And this used the cilk_for keyword.

  • And this says that we can execute

  • the iterations of this cilk_for loop in parallel.

  • And again, this says that the runtime system

  • is allowed to schedule these iterations in parallel,

  • but doesn't necessarily say that they

  • have to execute in parallel.

  • And under the hood, cilk_for statements

  • are translated into nested cilk_spawn and cilk_sync calls.

  • So the compiler is going to divide the iteration

  • space in half, do a cilk_spawn on one of the two halves,

  • call the other half, and then this

  • is done recursively until we reach

  • a certain size for the number of iterations

  • in a loop, at which point it just

  • creates a single task for that.

  • So any questions on the Cilk constructs?

  • Yes?

  • AUDIENCE: Is Cilk smart enough to recognize issues

  • with reading and writing for matrix transpose?

  • JULIAN SHUN: So it's actually not

  • going to figure out whether the iterations are

  • independent for you.

  • The programmer actually has to reason about that.

  • But Cilk does have a nice tool, which we'll talk about,

  • that will tell you which places your code might possibly

  • be reading and writing the same memory location,

  • and that allows you to localize any possible race

  • bugs in your code.

  • So we'll actually talk about races.

  • But if you just compile this code,

  • Cilk isn't going to know whether the iterations are independent.

  • So determinacy races-- so race conditions

  • are the bane of concurrency.

  • So you don't want to have race conditions in your code.

  • And there are these two famous race bugs that cause disaster.

  • So there is this Therac-25 radiation therapy machine,

  • and there was a race condition in the software.

  • And this led to three people being killed

  • and many more being seriously injured.

  • The North American blackout of 2003

  • was also caused by a race bug in the software,

  • and this left 50 million people without power.

  • So these are very bad.

  • And they're notoriously difficult to discover

  • by conventional testing.

  • So race bugs aren't going to appear every time

  • you execute your program.

  • And in fact, the hardest ones to find, which cause these events,

  • are actually very rare events.

  • So most of the times when you run your program,

  • you're not going to see the race bug.

  • Only very rarely will you see it.

  • So this makes it very hard to find these race bugs.

  • And furthermore, when you see a race bug,

  • it doesn't necessarily always happen

  • in the same place in your code.

  • So that makes it even harder.

  • So what is a race?

  • So a determinacy race is one of the most basic forms of races.

  • And a determinacy race occurs when

  • two logically parallel instructions

  • access the same memory location, and at least one

  • of these instructions performs a write to that location.

  • So let's look at a simple example.

  • So in this code here, I'm first setting x equal to 0.

  • And then I have a cilk_for loop with two iterations,

  • and each of the two iterations are

  • incrementing this variable x.

  • And then at the end, I'm going to assert that x is equal to 2.

  • So there's actually a race in this program here.

  • So in order to understand where the race occurs,

  • let's look at the execution graph here.

  • So I'm going to label each of these statements with a letter.

  • The first statement, a, is just setting x equal to 0.

  • And then after that, we're actually

  • going to have two parallel paths, because we

  • have two iterations of this cilk_for loop, which

  • can execute in parallel.

  • And each of these paths are going to increment x by 1.

  • And then finally, we're going to assert that x is equal to 2

  • at the end.

  • And this sort of graph is known as a dependency graph.

  • It tells you what instructions have

  • to finish before you execute the next instruction.

  • So here it says that B and C must

  • wait for A to execute before they proceed,

  • but B and C can actually happen in parallel, because there

  • is no dependency among them.

  • And then D has to happen after B and C finish.

  • So to understand why there's a race bug here,

  • we actually need to take a closer look

  • at this dependency graph.

  • So let's take a closer look.

  • So when you run this code, x plus plus

  • is actually going to be translated into three steps.

  • So first, we're going to load the value

  • of x into some processor's register, r1.

  • And then we're going to increment r1,

  • and then we're going to set x equal to the result of r1.

  • And the same thing for r2.

  • We're going to load x into register r2, increment r2,

  • and then set x equal to r2.

  • So here, we have a race, because both of these stores,

  • x1 equal to r1 and x2 equal to r2,

  • are actually writing to the same memory location.

  • So let's look at one possible execution of this computation

  • graph.

  • And we're going to keep track of the values of x, r1 and r2.

  • So the first instruction we're going to execute

  • is x equal to 0.

  • So we just set x equal to 0, and everything's good so far.

  • And then next, we can actually pick one of two instructions

  • to execute, because both of these two instructions

  • have their predecessors satisfied already.

  • Their predecessors have already executed.

  • So let's say I pick r1 equal to x to execute.

  • And this is going to place the value 0 into register r1.

  • Now I'm going to increment r1, so this

  • changes the value in r1 to 1.

  • Then now, let's say I execute r2 equal to x.

  • So that's going to read x, which has a value of 0.

  • It's going to place the value of 0 into r2.

  • It's going to increment r2.

  • That's going to change that value to 1.

  • And then now, let's say I write r2 back to x.

  • So I'm going to place a value of 1 into x.

  • Then now, when I execute this instruction, x1 equal to r1,

  • it's also placing a value of 1 into x.

  • And then finally, when I do the assertion,

  • this value here is not equal to 2, and that's wrong.

  • Because if you executed this sequentially,

  • you would get a value of 2 here.

  • And the reason-- as I said, the reason why this occurs

  • is because we have multiple writes

  • to the same shared memory location, which

  • could execute in parallel.

  • And one of the nasty things about this example

  • here is that the race bug doesn't necessarily always

  • occur.

  • So does anyone see why this race bug doesn't necessarily

  • always show up?

  • Yes?

  • AUDIENCE: [INAUDIBLE]

  • JULIAN SHUN: Right.

  • So the answer is because if one of these two branches

  • executes all three of its instructions

  • before we start the other one, then the final result in x

  • is going to be 2, which is correct.

  • So if I executed these instructions

  • in order of 1, 2, 3, 7, 4, 5, 6, and then, finally, 8, the value

  • is going to be 2 in x.

  • So the race bug here doesn't necessarily always occur.

  • And this is one thing that makes these bugs hard to find.

  • So any questions?

  • So there are two different types of determinacy races.

  • And they're shown in this table here.

  • So let's suppose that instruction A and instruction

  • B both access some location x, and suppose A is parallel to B.

  • So both of the instructions can execute in parallel.

  • So if A and B are just reading that location,

  • then that's fine.

  • You don't actually have a race here.

  • But if one of the two instructions

  • is writing to that location, whereas the other one is

  • reading to that location, then you

  • have what's called a read race.

  • And the program might have a non-deterministic result

  • when you have a read race, because the final answer might

  • depend on whether you read A first before B

  • updated the value, or whether A read the updated

  • value before B reads it.

  • So the order of the execution of A and B

  • can affect the final result that you see.

  • And finally, if both A and B write

  • to the same shared location, then you have a write race.

  • And again, this will cause non-deterministic behavior

  • in your program, because the final answer could depend on

  • whether A did the write first or B did the write first.

  • And we say that two sections of code

  • are independent if there are no determinacy races between them.

  • So the two pieces of code can't have a shared location,

  • where one computation writes to it

  • and another computation reads from it,

  • or if both computations write to that location.

  • Any questions on the definition?

  • So races are really bad, and you should avoid

  • having races in your program.

  • So here are some tips on how to avoid races.

  • So I can tell you not to write races in your program,

  • and you know that races are bad, but sometimes,

  • when you're writing code, you just

  • have races in your program, and you can't help it.

  • But here are some tips on how you can avoid races.

  • So first, the iterations of a cilk_for loop

  • should be independent.

  • So you should make sure that the different iterations

  • of a cilk_for loop aren't writing to the same memory

  • location.

  • Secondly, between a cilk_spawn statement

  • and a corresponding cilk_sync, the code of the spawn child

  • should be independent of the code of the parent.

  • And this includes code that's executed by additional spawned

  • or called children by the spawned child.

  • So you should make sure that these pieces of code

  • are independent-- there's no read