Placeholder Image

Subtitles section Play video

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • PROFESSOR: Hey, everybody.

  • It's my pleasure once again to welcome

  • TB Schardl, who is the author of your taper compiler,

  • to talk about the Cilk runtime system.

  • TAO SCHARDL: Thanks, Charles.

  • Can anyone hear me in the back, seem good?

  • OK.

  • Thanks for the introduction.

  • Today I'll be talking about the Cilk runtime system.

  • This is pretty exciting for me.

  • This is a lecture that's not about compilers.

  • I get to talk about something a little different for once.

  • It should be a fun lecture.

  • Recently, as I understand it, you've

  • been looking at storage allocation,

  • both in the serial case as well as the parallel case.

  • And you've already done Cilk programming for a while,

  • at this point.

  • This lecture, honestly, is a bit of a non

  • sequitur in terms of the overall flow of the course.

  • And it's also an advanced topic.

  • The Cilk runtime system is a pretty complicated piece

  • of software.

  • But nevertheless, I believe you should have enough background

  • to at least start to understand and appreciate

  • some of the aspects of the design of the Cilk runtime

  • system.

  • So that's why we're talking about that today.

  • Just to quickly recall something that you're all,

  • I'm sure, intimately familiar with by this point, what's

  • Cilk programming all about?

  • Well, Cilk is a parallel programming language

  • that allows you to make your software run faster

  • using parallel processors.

  • And to use Cilk, it's pretty straightforward.

  • You may start with some serial code that

  • runs in some running time-- we'll denote that as Ts

  • for certain parts of the lecture.

  • If you wanted to run in parallel using Cilk,

  • you just insert Cilk keywords in choice locations.

  • For example, you can parallelize the outer loop

  • in this matrix multiply kernel, and that will let your code run

  • in time Tp on P processors.

  • And ideally, Tp should be less than Ts.

  • Now, just adding keywords is all you

  • need to do to tell Cilk to execute

  • the computation in parallel.

  • What does Cilk do in light of those keywords?

  • At a very high level, Cilk and specifically its runtime system

  • takes care of the task of scheduling and load

  • balancing the computation on the parallel processors

  • and on the multicore system in general.

  • So after you've denoted logical parallel in the program using

  • spawn, Cilk spawn, Cilk sync, and Cilk four,

  • the Cilk scheduler maps that computation

  • onto the processors.

  • And it does so dynamically at runtime,

  • based on whatever processing resources happen

  • to be available, and still uses a randomized work stealing

  • scheduler which guarantees that that mapping is efficient

  • and the execution runs efficiently.

  • Now you've all been using the Cilk platform for a while.

  • In its basic usage, you write some Cilk code, possibly

  • by parallelizing ordinary serial code,

  • you feed that to a compiler, you get a binary,

  • you run the binary the binary with some particular input

  • on a multicore system.

  • You get parallel performance.

  • Today, we're going to look at how exactly does Cilk work?

  • What's the magic that goes on, hidden

  • by the boxes on this diagram?

  • And the very first thing to note is that this picture

  • is a little bit--

  • the first simplification that we're going to break

  • is that it's not really just Cilk source and the Cilk

  • compiler.

  • There's also a runtime system library, libcilkrts.so, in case

  • you've seen that file or messages

  • about that file on your system.

  • And really it's the compiler and the runtime library,

  • that work together to implement Cilk's runtime system,

  • to do the work stealing and do the efficient scheduling

  • and load balancing.

  • Now we might suspect that if you just take a look at the code

  • that you get when you compile a Cilk program,

  • that might tell you something about how Cilk works.

  • Here's C pseudocode for the results when you compile

  • a simple piece of Cilk code.

  • It's a bit complicated.

  • I think that's fair to say.

  • There's a lot going on here.

  • There is one function in the original program,

  • now there are two.

  • There's some new variables, there's

  • some calls to functions that look a little bit strange,

  • there's a lot going on in the compiled results.

  • This isn't exactly easy to interpret or understand,

  • and this doesn't even bring into the picture the runtime system

  • library.

  • The runtime system library, you can find the source

  • code online.

  • It's a little less than 20,000 lines of code.

  • It's also kind of complicated.

  • So rather than dive into the code directly,

  • what we're going to do today is an attempt

  • at a top-down approach to understanding

  • how the Cilk runtime system works,

  • and some of the design considerations.

  • So we're going to start by talking about some

  • of the required functionality that we need out of the Cilk

  • runtime system, as well as some performance

  • considerations for how the runtime system should work.

  • And then we'll take a look at how the worker deques in Cilk

  • get implemented, how spawning actually works,

  • how stealing a computation works,

  • and how synchronization works within Cilk.

  • That all sound good?

  • Any questions so far?

  • This should all be review, more or less.

  • OK, so let's talk a little bit about required functionality.

  • You've seen this picture before, I hope.

  • This picture illustrated the execution model

  • of a Cilk program.

  • Here we have everyone's favorite exponential time Fibonacci

  • routine, parallelized using Cilk.

  • This is not an efficient way to compute Fibonacci numbers,

  • but it's a nice didactic example for understanding

  • parallel computation, especially the Cilk model.

  • And as we saw many lectures ago, when

  • you run this program on a given input,

  • the execution of the program can be

  • modeled as a computation dag.

  • And this computation dag unfolds dynamically

  • as the program executes.

  • But I want to stop and take a hard look

  • at exactly what that dynamic execution looks like when we've

  • got parallel processors and work stealing all coming into play.

  • So we'll stick with this Fibonacci routine,

  • and we'll imagine we've just got one processor on the system,

  • to start.

  • And we're just going to use this one

  • processor to execute fib(4).

  • And it's going to take some time to do it,

  • just to make the story interesting.

  • So we start executing this computation,

  • and that one processor is just going to execute the Fibonacci

  • routine from beginning up to the Cilk spawn statement,

  • as if it's ordinary serial code, because it

  • is ordinary serial code.

  • At this point the processor hits the Cilk spawn statement.

  • What happens now?

  • Anyone remember?

  • What happens to the dag?

  • AUDIENCE: It branches down [INAUDIBLE]

  • TAO SCHARDL: It branches downward and spawns

  • another process, more or less.

  • The way we model that--

  • the Cilk spawn is of a routine fib of n minus 1.

  • In this case, that'll be fib(3).

  • And so, like an ordinary function call,

  • we're going to get a brand new frame for fib(3).

  • And that's going to have some strand that's

  • available to execute.

  • But the spawn is not your typical function call.

  • It actually allows some other computation to run in parallel.

  • And so the way we model that in this picture

  • is that we get a new frame for fib(3).