Subtitles section Play video Print subtitles The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. JULIAN SHUN: Good afternoon, everyone. So let's get started. So today, we're going to be talking about races and parallelism. And you'll be doing a lot of parallel programming for the next homework assignment and project. One thing I want to point out is that it's important to meet with your MITPOSSE as soon as possible, if you haven't done so already, since that's going to be part of the evaluation for the Project 1 grade. And if you have trouble reaching your MITPOSSE members, please contact your TA and also make a post on Piazza as soon as possible. So as a reminder, let's look at the basics of Cilk. So we have cilk_spawn and cilk_sync statements. In Cilk, this was the code that we saw in last lecture, which computes the nth Fibonacci number. So when we say cilk_spawn, it means that the named child function, the function right after the cilk_spawn keyword, can execute in parallel with the parent caller. So it says that fib of n minus 1 can execute in parallel with the fib function that called it. And then cilk_sync says that control cannot pass this point until all of this spawned children have returned. So this is going to wait for fib of n minus 1 to finish before it goes on and returns the sum of x and y. And recall that the Cilk keywords grant permission for parallel execution, but they don't actually force parallel execution. So this code here says that we can execute fib of n minus 1 in parallel with this parent caller, but it doesn't say that we necessarily have to execute them in parallel. And it's up to the runtime system to decide whether these different functions will be executed in parallel. We'll talk more about the runtime system today. And also, we talked about this example, where we wanted to do an in-place matrix transpose. And this used the cilk_for keyword. And this says that we can execute the iterations of this cilk_for loop in parallel. And again, this says that the runtime system is allowed to schedule these iterations in parallel, but doesn't necessarily say that they have to execute in parallel. And under the hood, cilk_for statements are translated into nested cilk_spawn and cilk_sync calls. So the compiler is going to divide the iteration space in half, do a cilk_spawn on one of the two halves, call the other half, and then this is done recursively until we reach a certain size for the number of iterations in a loop, at which point it just creates a single task for that. So any questions on the Cilk constructs? Yes? AUDIENCE: Is Cilk smart enough to recognize issues with reading and writing for matrix transpose? JULIAN SHUN: So it's actually not going to figure out whether the iterations are independent for you. The programmer actually has to reason about that. But Cilk does have a nice tool, which we'll talk about, that will tell you which places your code might possibly be reading and writing the same memory location, and that allows you to localize any possible race bugs in your code. So we'll actually talk about races. But if you just compile this code, Cilk isn't going to know whether the iterations are independent. So determinacy races-- so race conditions are the bane of concurrency. So you don't want to have race conditions in your code. And there are these two famous race bugs that cause disaster. So there is this Therac-25 radiation therapy machine, and there was a race condition in the software. And this led to three people being killed and many more being seriously injured. The North American blackout of 2003 was also caused by a race bug in the software, and this left 50 million people without power. So these are very bad. And they're notoriously difficult to discover by conventional testing. So race bugs aren't going to appear every time you execute your program. And in fact, the hardest ones to find, which cause these events, are actually very rare events. So most of the times when you run your program, you're not going to see the race bug. Only very rarely will you see it. So this makes it very hard to find these race bugs. And furthermore, when you see a race bug, it doesn't necessarily always happen in the same place in your code. So that makes it even harder. So what is a race? So a determinacy race is one of the most basic forms of races. And a determinacy race occurs when two logically parallel instructions access the same memory location, and at least one of these instructions performs a write to that location. So let's look at a simple example. So in this code here, I'm first setting x equal to 0. And then I have a cilk_for loop with two iterations, and each of the two iterations are incrementing this variable x. And then at the end, I'm going to assert that x is equal to 2. So there's actually a race in this program here. So in order to understand where the race occurs, let's look at the execution graph here. So I'm going to label each of these statements with a letter. The first statement, a, is just setting x equal to 0. And then after that, we're actually going to have two parallel paths, because we have two iterations of this cilk_for loop, which can execute in parallel. And each of these paths are going to increment x by 1. And then finally, we're going to assert that x is equal to 2 at the end. And this sort of graph is known as a dependency graph. It tells you what instructions have to finish before you execute the next instruction. So here it says that B and C must wait for A to execute before they proceed, but B and C can actually happen in parallel, because there is no dependency among them. And then D has to happen after B and C finish. So to understand why there's a race bug here, we actually need to take a closer look at this dependency graph. So let's take a closer look. So when you run this code, x plus plus is actually going to be translated into three steps. So first, we're going to load the value of x into some processor's register, r1. And then we're going to increment r1, and then we're going to set x equal to the result of r1. And the same thing for r2. We're going to load x into register r2, increment r2, and then set x equal to r2. So here, we have a race, because both of these stores, x1 equal to r1 and x2 equal to r2, are actually writing to the same memory location. So let's look at one possible execution of this computation graph. And we're going to keep track of the values of x, r1 and r2. So the first instruction we're going to execute is x equal to 0. So we just set x equal to 0, and everything's good so far. And then next, we can actually pick one of two instructions to execute, because both of these two instructions have their predecessors satisfied already. Their predecessors have already executed. So let's say I pick r1 equal to x to execute. And this is going to place the value 0 into register r1. Now I'm going to increment r1, so this changes the value in r1 to 1. Then now, let's say I execute r2 equal to x. So that's going to read x, which has a value of 0. It's going to place the value of 0 into r2.