Subtitles section Play video Print subtitles The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality, educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. JULIAN SHUN: So welcome to the second lecture of 6.172, performance engineering of software systems. Today, we're going to be talking about Bentley rules for optimizing work. All right, so work, does anyone know what work means? You're all at MIT, so you should know. So in terms of computer programming, there's actually a formal definition of work. The work of a program on a particular input is defined to be the sum total of all the operations executed by the program. So it's basically a gross measure of how much stuff the program needs to do. And the idea of optimizing work is to reduce the amount of stuff that the program needs to do in order to improve the running time of your program, improve its performance. So algorithm design can produce dramatic reductions in the work of a program. For example, if you want to sort an array of elements, you can use a nlogn time QuickSort. Or you can use an n squared time sort, like insertion sort. So you've probably seen this before in your algorithm courses. And for large enough values of n, a nlogn time sort is going to be much faster than a n squared sort. So today, I'm not going to be talking about algorithm design. You'll see more of this in other courses here at MIT. And we'll also talk a little bit about algorithm design later on in this semester. We will be talking about many other cool tricks for reducing the work of a program. But I do want to point out, that reducing the work of our program doesn't automatically translate to a reduction in running time. And this is because of the complex nature of computer hardware. So there's a lot of things going on that aren't captured by this definition of work. There's instruction level parallelism, caching, vectorization, speculation and branch prediction, and so on. And we'll learn about some of these things throughout this semester. But reducing the work of our program does serve as a good heuristic for reducing the overall running time of a program, at least to a first order. So today, we'll be learning about many ways to reduce the work of your program. So rules we'll be looking at, we call them Bentley optimization rules, in honor of John Lewis Bentley. So John Lewis Bentley wrote a nice little book back in 1982 called Writing Efficient Programs. And inside this book there are various techniques for reducing the work of a computer program. So if you haven't seen this book before, it's very good. So I highly encourage you to read it. Many of the original rules in Bentley's book had to deal with the vagaries of computer architecture three and a half decades ago. So today, we've created a new set of Bentley rules just dealing with the work of a program. We'll be talking about architecture-specific optimizations later on in the semester. But today, we won't be talking about this. One cool fact is that John Lewis Bentley is actually my academic great grandfather. So John Bentley was one of Charles Leiseron's academic advisors. Charles Leiserson was Guy Blelloch's academic advisor. And Guy Blelloch, who's a professor at Carnegie Mellon, was my advisor when I was a graduate student at CMU. So it's a nice little fact. And I had the honor of meeting John Bentley a couple of years ago at a conference. And he told me that he was my academic great grandfather. [LAUGHING] Yeah, and Charles is my academic grandfather. And all of Charles's students are my academic aunts and uncles-- [LAUGHING] --including your T.A. Helen. OK, so here's a list of all the work optimizations that we'll be looking at today. So they're grouped into four categories, data structures, loops, and functions. So there's a list of 22 rules on this slide today. In fact, we'll actually be able to look at all of them today. So today's lecture is going to be structured as a series of many lectures. And I'm going to be spending one to three slides on each one of these optimizations. All right, so let's start with optimizations for data structures. So first optimization is packing and encoding your data structure. And the idea of packing is to store more than one data value in a machine word. And the related idea of encoding is to convert data values into a representation that requires fewer bits. So does anyone know why this could possibly reduce the running time of a program? Yes? AUDIENCE: Need less memory fetches. JULIAN SHUN: Right, so good answer. The answer was, it might need less memory fetches. And it turns out that that's correct, because computer program spends a lot of time moving stuff around in memory. And if you reduce the number of things that you have to move around in memory, then that's a good heuristic for reducing the running time of your program. So let's look at an example. Let's say we wanted to encode dates. So let's say we wanted to code this string, September 11, 2018. You can store this using 18 bytes. So you can use one byte per character here. And this would require more than two double words, because each double word is eight bytes or 64 bits. And you have 18 bytes. You need more than two double words. And you have to move around these words every time you want to manipulate the date. But turns out that you can actually do better than using 18 bytes. So let's assume that we only want to store years between 4096 BCE and 4096 CE. So there are about 365.25 times 8,192 dates in this range, which is three million approximately. And you can use log base two of three million bits to represent all the dates within this range. So the notation lg here means log base of two. That's going to be the notation I'll be using in this class. And L-O-G will mean log base 10. So we take the ceiling of log base two or three million, and that gives us 22 bits. So a good way to remember how to compute the log base two of something, you can remember that the log base two of one million is 20, log base two of 1,000 is 10. And then you can factor this out and then add in log base two of three, rounded up, which is two. So that gives you 22 bits. And that easily fits within one 32-bit words. Now, you only need one word instead of three words, as you did in the original representation. But with this modified representation, now determining the month of a particular date will take more work, because now you're not explicitly storing the month in your representation. Whereas, with the string representation, you are explicitly storing it at the beginning of the string. So this does take more work, but it requires less space. So any questions so far? OK, so it turns out that there's another way to store this, which also makes it easy for you to fetch the month, the year, or the day for a particular date. So here, we're going to use the bit fields facilities in C. So we're going to create a struct called date underscore t with three fields, the year, the month, and the date. And the integer after the semicolon specifies how many bits I want to assign to this particular field in the struct. So this says, I need 13 bits for the year, four bits for the month, and five bits for the day. So the 13 bits for the year is, because I have 8,192 possible years. So I need 13 bits to store that. For the month, I have 12 possible months. So I need log base two of 12 rounded up, which is four. And then finally, for the day, I need log base two of 31 rounded up, which is five. So in total, this still takes 22 bits. But now the individual fields can now be accessed much more quickly, than if we had just encoded the three million dates using sequential integers, because now you can just extract a month just by saying whatever you named your struct. You can just say that struct dot month. And that give you the month. Yes? AUDIENCE: Does C actually store it like that, because I know C++ it makes it finalize. So then you end up taking more space. JULIAN SHUN: Yeah, so this will actually pad the struct a little bit at the end, yeah. So you actually do require a little bit more than 22 bits. That's a good question. But this representation is much more easy to access, than if you just had encoded the integers as sequential integers. Another point is that sometimes unpacking and decoding are the optimization, because sometimes it takes a lot of work to encode the values and to extract them. So sometimes you want to actually unpack the values so that they take more space, but they're faster to access. So it depends on your particular application. You might want to do one thing or the other. And the way to figure this out is just to experiment with it. OK, so any other questions? All right, so the second optimization is data structure augmentation. And the idea here is to add information to a data structure to make common operations do less work, so that they're faster. And let's look at an example. Let's say we had two singly linked list and we wanted to append them together. And let's say we only stored the head pointer to the list, and then each element in the list has a pointer to the next element in the list. Now, if you want to spend one list to another list, well, that's going to require you walking down the first list to find the last element, so that you can change the pointer of the last element to point to the beginning of the next list. And this might be very slow if the first list is very long. So does anyone see a way to augment this data structure so that appending two lists can be done much more efficiently? Yes? AUDIENCE: Store a pointer to the last value. JULIAN SHUN: Yeah, so the answer is to store a pointer to the last value. And we call that the tail pointer. So now we have two pointers, both the head and the tail. The head points to the beginning of the list. The tail points to the end of the list. And now you can just append two lists in constant time, because you can access the last element in the list by following the tail pointer. And then now you just change the successor pointer of the last element to point to the head of the second list. And then now you also have to update the tail to point to the end of the second list. OK, so that's the idea of data structure augmentation. We added a little bit of extra information to the data structure, such that now appending two lists is much more efficient than in the original method, where we only had a head pointer. Questions? OK, so the next optimization is precomputation. The idea of precomputation is to perform some calculations in advance so as to avoid doing these computations at mission-critical times, to avoid doing them at runtime.