## Subtitles section Play video

• The following content is provided under a Creative

• Your support will help MIT OpenCourseWare

• To make a donation or to view additional materials

• from hundreds of MIT courses, visit MIT OpenCourseWare

• at ocw.mit.edu.

• JULIAN SHUN: Good afternoon, everyone.

• So let's get started.

• So today, we're going to be talking

• And you'll be doing a lot of parallel programming

• for the next homework assignment and project.

• One thing I want to point out is that it's important

• to meet with your MITPOSSE as soon as possible,

• if you haven't done so already, since that's

• going to be part of the evaluation for the Project 1

• And if you have trouble reaching your MITPOSSE members,

• as soon as possible.

• So as a reminder, let's look at the basics of Cilk.

• So we have cilk_spawn and cilk_sync statements.

• In Cilk, this was the code that we

• saw in last lecture, which computes the nth Fibonacci

• number.

• So when we say cilk_spawn, it means

• that the named child function, the function right

• after the cilk_spawn keyword, can execute in parallel

• with the parent caller.

• So it says that fib of n minus 1 can

• execute in parallel with the fib function that called it.

• And then cilk_sync says that control cannot pass this point

• until all of this spawned children have returned.

• So this is going to wait for fib of n minus 1

• to finish before it goes on and returns the sum of x and y.

• And recall that the Cilk keywords grant permission

• for parallel execution, but they don't actually

• force parallel execution.

• So this code here says that we can execute fib of n minus 1

• in parallel with this parent caller,

• but it doesn't say that we necessarily have

• to execute them in parallel.

• And it's up to the runtime system

• to decide whether these different functions will

• be executed in parallel.

• We'll talk more about the runtime system today.

• where we wanted to do an in-place matrix transpose.

• And this used the cilk_for keyword.

• And this says that we can execute

• the iterations of this cilk_for loop in parallel.

• And again, this says that the runtime system

• is allowed to schedule these iterations in parallel,

• but doesn't necessarily say that they

• have to execute in parallel.

• And under the hood, cilk_for statements

• are translated into nested cilk_spawn and cilk_sync calls.

• So the compiler is going to divide the iteration

• space in half, do a cilk_spawn on one of the two halves,

• call the other half, and then this

• is done recursively until we reach

• a certain size for the number of iterations

• in a loop, at which point it just

• creates a single task for that.

• So any questions on the Cilk constructs?

• Yes?

• AUDIENCE: Is Cilk smart enough to recognize issues

• with reading and writing for matrix transpose?

• JULIAN SHUN: So it's actually not

• going to figure out whether the iterations are

• independent for you.

• The programmer actually has to reason about that.

• But Cilk does have a nice tool, which we'll talk about,

• that will tell you which places your code might possibly

• be reading and writing the same memory location,

• and that allows you to localize any possible race

• So we'll actually talk about races.

• But if you just compile this code,

• Cilk isn't going to know whether the iterations are independent.

• So determinacy races-- so race conditions

• are the bane of concurrency.

• So you don't want to have race conditions in your code.

• And there are these two famous race bugs that cause disaster.

• So there is this Therac-25 radiation therapy machine,

• and there was a race condition in the software.

• And this led to three people being killed

• and many more being seriously injured.

• The North American blackout of 2003

• was also caused by a race bug in the software,

• and this left 50 million people without power.

• So these are very bad.

• And they're notoriously difficult to discover

• by conventional testing.

• So race bugs aren't going to appear every time

• And in fact, the hardest ones to find, which cause these events,

• are actually very rare events.

• So most of the times when you run your program,

• you're not going to see the race bug.

• Only very rarely will you see it.

• So this makes it very hard to find these race bugs.

• And furthermore, when you see a race bug,

• it doesn't necessarily always happen

• in the same place in your code.

• So that makes it even harder.

• So what is a race?

• So a determinacy race is one of the most basic forms of races.

• And a determinacy race occurs when

• two logically parallel instructions

• access the same memory location, and at least one

• of these instructions performs a write to that location.

• So let's look at a simple example.

• So in this code here, I'm first setting x equal to 0.

• And then I have a cilk_for loop with two iterations,

• and each of the two iterations are

• incrementing this variable x.

• And then at the end, I'm going to assert that x is equal to 2.

• So there's actually a race in this program here.

• So in order to understand where the race occurs,

• let's look at the execution graph here.

• So I'm going to label each of these statements with a letter.

• The first statement, a, is just setting x equal to 0.

• And then after that, we're actually

• going to have two parallel paths, because we

• have two iterations of this cilk_for loop, which

• can execute in parallel.

• And each of these paths are going to increment x by 1.

• And then finally, we're going to assert that x is equal to 2

• at the end.

• And this sort of graph is known as a dependency graph.

• It tells you what instructions have

• to finish before you execute the next instruction.

• So here it says that B and C must

• wait for A to execute before they proceed,

• but B and C can actually happen in parallel, because there

• is no dependency among them.

• And then D has to happen after B and C finish.

• So to understand why there's a race bug here,

• we actually need to take a closer look

• at this dependency graph.

• So let's take a closer look.

• So when you run this code, x plus plus

• is actually going to be translated into three steps.

• So first, we're going to load the value

• of x into some processor's register, r1.

• And then we're going to increment r1,

• and then we're going to set x equal to the result of r1.

• And the same thing for r2.

• We're going to load x into register r2, increment r2,

• and then set x equal to r2.

• So here, we have a race, because both of these stores,

• x1 equal to r1 and x2 equal to r2,

• are actually writing to the same memory location.

• So let's look at one possible execution of this computation

• graph.

• And we're going to keep track of the values of x, r1 and r2.

• So the first instruction we're going to execute

• is x equal to 0.

• So we just set x equal to 0, and everything's good so far.

• And then next, we can actually pick one of two instructions

• to execute, because both of these two instructions

• have their predecessors satisfied already.

• Their predecessors have already executed.

• So let's say I pick r1 equal to x to execute.

• And this is going to place the value 0 into register r1.

• Now I'm going to increment r1, so this

• changes the value in r1 to 1.

• Then now, let's say I execute r2 equal to x.

• So that's going to read x, which has a value of 0.

• It's going to place the value of 0 into r2.

• It's going to increment r2.

• That's going to change that value to 1.