Subtitles section Play video Print subtitles The following content is provided under a Creative Commons license. Your support will help MIT Open CourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT Open CourseWare at ocw.mit.edu. CHARLES E. LEISERSON: Hey, everybody. Let's get going. Who here has heard of the FFT? That's most of you. So I first met Steve Johnson when he worked with one of my graduate students, now former graduate student, Matteo Frigo. And they came up with a really spectacular piece of performance engineering for the FFT, a system they call FFTW for the Fastest Fourier Transform in the West. And it has, for over years and years been a staple of anybody doing signal processing will know FFTW. So anyway, it's a great pleasure to welcome Steve Johnson, who is going to talk about some of the work that he's been doing on dynamic languages, such as Julia and Python. STEVEN JOHNSON: Yeah. Thanks. CHARLES E. LEISERSON: Is that pretty actuate? STEVEN JOHNSON: Yeah. Yeah, so I'm going to talk, as I said, about high level dynamic languages and how you get performance in these. And so most of you have probably used Python, or R, and Matlab. And so these are really popular for people doing in technical computing, statistics, and anything where you want kind of interactive exploration. You'd like to have a dynamically typed language where you can just type x equals 3. And then three lines later, you said, oh, x is an array. Because you're doing things interactively. You don't have to be stuck with a particular set of types. And there's a lot of choices for these. But they usually hit a wall when it comes to writing performance critical code in these languages. And so traditionally, people doing some serious computing in these languages have a two-language solution. So they do high level exploration, and productivity and so forth in Python or whatever. But when they need to write performance critical code, then you drop down to a lower level language, Fortran, or C, or Cython, or one of these things. And you use Python as the glue for these low level kernels. And the problem-- and this is workable. I've done this myself. Many of you have probably done this. But when you drop down from Python to C, or even to Cython, there there's a huge discontinuous jump in the complexity of the coding. And there's usually a lot of generality. When you write code in C or something like that, it's specific to a very small set of types, whereas the nice thing about high level languages is you can write generic code that works for a lot of different types. So at this point, there's often someone who pops up and says, oh, well, I did performance programming in Python. And everyone knows you just need to vectorize your code, right? So basically, what they mean is you rely on mature external libraries that you pass on a big block of data. It does a huge amount of computation and comes back. And so you never write your own loops. And this is great. If there's someone who's already written the code that you need, you should try and leverage that as much as possible. But somebody has to write those. And eventually, that person will be you. And because eventually if you do scientific computing, you run into a problem inevitably that you just can't express in terms of existing libraries very easily or at all. So this was the state of affairs for a long time. And a few years ago, starting in Alan Edelman's group at MIT, there was a proposal for a new language called Julia, which tries to be as high level and interactive as-- it's a dynamically typed language, you know, as Matlab, or Python, and so forth. But general purpose language like Python, very productive for technical work, so really oriented towards scientific numerical computing. But you can write a loop, and you write low level code in that that's as fast as C. So that was the goal. The first release was in 2013. So it's a pretty young language. The 1.0 release was in August of this year. So before that point every year there was a new release, 0.1, 0.2, Point3. And every year, it would break all your old code, and you'd have to update everything to keep it working. So now they said, OK, it's stable. We'll add new features. We'll make it faster. But from this point onwards, for least until 2.0, many years in the future it will be backwards compatible. So there's lots of-- in my experience, this pretty much holds up. I haven't found any problem where there was a nice highly optimized C or Fortran code where I couldn't write equivalent code or equivalent performance, equivalently performing code in Julia given enough time, right? Obviously, if something is-- there's a library with 100,000 lines of code. It takes quite a long time to rewrite that in any other language. So there are lots of benchmarks that illustrate this. The goal of Julia is usually to stay within a factor of 2 of C. In my experience, it's usually within a factor of a few percent if you know what you're doing. So there's a very simple example that I like to use, which is generating a Vandermonde matrix. So giving a vector a value as alpha 1 alpha 2 to alpha n. And you want to make an n by m matrix whose columns are just those entries to 0 with power, first power squared, cubed, and so forth element-wise. All right, so this kind of matrix shows up in a lot of problems. So most matrix and vector libraries have a built in function to do this and Python. In NumPy, there is a function called numpy.vander to do this. And if you look at-- it's generating a big matrix. It could be performance critical. So they can implement it in Python. So if you look at the NumPy implementation, it's a little Python shim that calls immediately to C. And then if you look at the C code-- I won't scroll through it-- but it's several hundred lines of code. It's quite long and complicated. And all that several hundred lines of code is doing is just figuring out what types to work with, like what kernels to dispatch to. And at the end of that, it dispatches to a kernel that does the actual work. And that kernel is also C code, but that C code was generated by a special purpose code generation. So it's quite involved to get good performance for this while still being somewhat type generic. So their goal is to have something that works for basically any NumPy array and any NumPy type, which there's a handful, like maybe a dozen scalar types that it should work with, all right? So if you're implementing this in C, it's really trivial to write 20 lines of code that implements this but only for double precision, a point or two double position array, all right? So the difficulty is getting type generic in C. So in Julia. Here is the implementation in Julia. It looks at first glance much like what roughly what a C or Fourier implementation would look like. It's just implemented the most simple way. It's just two nested loops. So just basically, you loop across. And as you go across, you accumulate powers by multiplying repeatedly by x. That's all it is. And it just fills in the array. The performance of that graph here is the time for the NumPy implementation divided by the time for the Julie implementation