## Subtitles section Play video

• The following content is provided under a Creative

• Your support will help MIT OpenCourseWare

• continue to offer high-quality educational resources for free.

• To make a donation or to view additional materials

• from hundreds of MIT courses, visit MIT OpenCourseWare

• at ocw.mit.edu.

• JULIAN SHUN: Good afternoon, everyone.

• So let's get started.

• So today, we're going to be talking

• about races and parallelism.

• And you'll be doing a lot of parallel programming

• for the next homework assignment and project.

• One thing I want to point out is that it's important

• to meet with your MITPOSSE as soon as possible,

• if you haven't done so already, since that's

• going to be part of the evaluation for the Project 1

• And if you have trouble reaching your MITPOSSE members,

• as soon as possible.

• So as a reminder, let's look at the basics of Cilk.

• So we have cilk_spawn and cilk_sync statements.

• In Cilk, this was the code that we

• saw in last lecture, which computes the nth Fibonacci

• number.

• So when we say cilk_spawn, it means

• that the named child function, the function right

• after the cilk_spawn keyword, can execute in parallel

• with the parent caller.

• So it says that fib of n minus 1 can

• execute in parallel with the fib function that called it.

• And then cilk_sync says that control cannot pass this point

• until all of this spawned children have returned.

• So this is going to wait for fib of n minus 1

• to finish before it goes on and returns the sum of x and y.

• And recall that the Cilk keywords grant permission

• for parallel execution, but they don't actually

• force parallel execution.

• So this code here says that we can execute fib of n minus 1

• in parallel with this parent caller,

• but it doesn't say that we necessarily have

• to execute them in parallel.

• And it's up to the runtime system

• to decide whether these different functions will

• be executed in parallel.

• We'll talk more about the runtime system today.

• where we wanted to do an in-place matrix transpose.

• And this used the cilk_for keyword.

• And this says that we can execute

• the iterations of this cilk_for loop in parallel.

• And again, this says that the runtime system

• is allowed to schedule these iterations in parallel,

• but doesn't necessarily say that they

• have to execute in parallel.

• And under the hood, cilk_for statements

• are translated into nested cilk_spawn and cilk_sync calls.

• So the compiler is going to divide the iteration

• space in half, do a cilk_spawn on one of the two halves,

• call the other half, and then this

• is done recursively until we reach

• a certain size for the number of iterations

• in a loop, at which point it just

• creates a single task for that.

• So any questions on the Cilk constructs?

• Yes?

• AUDIENCE: Is Cilk smart enough to recognize issues

• with reading and writing for matrix transpose?

• JULIAN SHUN: So it's actually not

• going to figure out whether the iterations are

• independent for you.

• The programmer actually has to reason about that.

• But Cilk does have a nice tool, which we'll talk about,

• that will tell you which places your code might possibly

• be reading and writing the same memory location,

• and that allows you to localize any possible race

• bugs in your code.

• So we'll actually talk about races.

• But if you just compile this code,

• Cilk isn't going to know whether the iterations are independent.

• So determinacy races-- so race conditions

• are the bane of concurrency.

• So you don't want to have race conditions in your code.

• And there are these two famous race bugs that cause disaster.

• So there is this Therac-25 radiation therapy machine,

• and there was a race condition in the software.

• And this led to three people being killed

• and many more being seriously injured.

• The North American blackout of 2003

• was also caused by a race bug in the software,

• and this left 50 million people without power.

• So these are very bad.

• And they're notoriously difficult to discover

• by conventional testing.

• So race bugs aren't going to appear every time

• you execute your program.

• And in fact, the hardest ones to find, which cause these events,

• are actually very rare events.

• So most of the times when you run your program,

• you're not going to see the race bug.

• Only very rarely will you see it.

• So this makes it very hard to find these race bugs.

• And furthermore, when you see a race bug,

• it doesn't necessarily always happen

• in the same place in your code.

• So that makes it even harder.

• So what is a race?

• So a determinacy race is one of the most basic forms of races.

• And a determinacy race occurs when

• two logically parallel instructions

• access the same memory location, and at least one

• of these instructions performs a write to that location.

• So let's look at a simple example.

• So in this code here, I'm first setting x equal to 0.

• And then I have a cilk_for loop with two iterations,

• and each of the two iterations are

• incrementing this variable x.

• And then at the end, I'm going to assert that x is equal to 2.

• So there's actually a race in this program here.

• So in order to understand where the race occurs,

• let's look at the execution graph here.

• So I'm going to label each of these statements with a letter.

• The first statement, a, is just setting x equal to 0.

• And then after that, we're actually

• going to have two parallel paths, because we

• have two iterations of this cilk_for loop, which

• can execute in parallel.

• And each of these paths are going to increment x by 1.

• And then finally, we're going to assert that x is equal to 2

• at the end.

• And this sort of graph is known as a dependency graph.

• It tells you what instructions have

• to finish before you execute the next instruction.

• So here it says that B and C must

• wait for A to execute before they proceed,

• but B and C can actually happen in parallel, because there

• is no dependency among them.

• And then D has to happen after B and C finish.

• So to understand why there's a race bug here,

• we actually need to take a closer look

• at this dependency graph.

• So let's take a closer look.

• So when you run this code, x plus plus

• is actually going to be translated into three steps.

• So first, we're going to load the value

• of x into some processor's register, r1.

• And then we're going to increment r1,

• and then we're going to set x equal to the result of r1.

• And the same thing for r2.

• We're going to load x into register r2, increment r2,

• and then set x equal to r2.

• So here, we have a race, because both of these stores,

• x1 equal to r1 and x2 equal to r2,

• are actually writing to the same memory location.

• So let's look at one possible execution of this computation

• graph.

• And we're going to keep track of the values of x, r1 and r2.

• So the first instruction we're going to execute

• is x equal to 0.

• So we just set x equal to 0, and everything's good so far.

• And then next, we can actually pick one of two instructions

• to execute, because both of these two instructions

• have their predecessors satisfied already.

• Their predecessors have already executed.

• So let's say I pick r1 equal to x to execute.

• And this is going to place the value 0 into register r1.

• Now I'm going to increment r1, so this

• changes the value in r1 to 1.

• Then now, let's say I execute r2 equal to x.

• So that's going to read x, which has a value of 0.

• It's going to place the value of 0 into r2.

• It's going to increment r2.

• That's going to change that value to 1.

• And then now, let's say I write r2 back to x.

• So I'm going to place a value of 1 into x.

• Then now, when I execute this instruction, x1 equal to r1,

• it's also placing a value of 1 into x.

• And then finally, when I do the assertion,

• this value here is not equal to 2, and that's wrong.

• Because if you executed this sequentially,

• you would get a value of 2 here.

• And the reason-- as I said, the reason why this occurs

• is because we have multiple writes

• to the same shared memory location, which

• could execute in parallel.

• And one of the nasty things about this example

• here is that the race bug doesn't necessarily always

• occur.

• So does anyone see why this race bug doesn't necessarily

• always show up?

• Yes?

• AUDIENCE: [INAUDIBLE]

• JULIAN SHUN: Right.

• So the answer is because if one of these two branches

• executes all three of its instructions

• before we start the other one, then the final result in x

• is going to be 2, which is correct.

• So if I executed these instructions

• in order of 1, 2, 3, 7, 4, 5, 6, and then, finally, 8, the value

• is going to be 2 in x.

• So the race bug here doesn't necessarily always occur.

• And this is one thing that makes these bugs hard to find.

• So any questions?

• So there are two different types of determinacy races.

• And they're shown in this table here.

• So let's suppose that instruction A and instruction

• B both access some location x, and suppose A is parallel to B.

• So both of the instructions can execute in parallel.

• So if A and B are just reading that location,

• then that's fine.

• You don't actually have a race here.

• But if one of the two instructions

• is writing to that location, whereas the other one is

• reading to that location, then you

• have what's called a read race.

• And the program might have a non-deterministic result

• when you have a read race, because the final answer might

• depend on whether you read A first before B

• updated the value, or whether A read the updated

• value before B reads it.

• So the order of the execution of A and B

• can affect the final result that you see.

• And finally, if both A and B write

• to the same shared location, then you have a write race.

• And again, this will cause non-deterministic behavior

• in your program, because the final answer could depend on

• whether A did the write first or B did the write first.

• And we say that two sections of code

• are independent if there are no determinacy races between them.

• So the two pieces of code can't have a shared location,

• where one computation writes to it

• and another computation reads from it,

• or if both computations write to that location.

• Any questions on the definition?

• So races are really bad, and you should avoid

• having races in your program.

• So here are some tips on how to avoid races.

• So I can tell you not to write races in your program,

• and you know that races are bad, but sometimes,

• when you're writing code, you just

• have races in your program, and you can't help it.

• But here are some tips on how you can avoid races.

• So first, the iterations of a cilk_for loop

• should be independent.

• So you should make sure that the different iterations

• of a cilk_for loop aren't writing to the same memory

• location.

• Secondly, between a cilk_spawn statement

• and a corresponding cilk_sync, the code of the spawn child

• should be independent of the code of the parent.

• And this includes code that's executed by additional spawned

• or called children by the spawned child.

• So you should make sure that these pieces of code

• are independent-- there's no read