Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • DAVID MALAN: All right, this is CS50.

  • And this is our look today at data structures.

  • You'll recall last week that we gave ourselves

  • a few new ingredients and some new syntax in C.

  • We introduced pointers, the ability to address chunks of memory

  • by actual addresses--

  • 0, 1, 2, 3 and on up.

  • And we introduced the star notation and a few other functions,

  • malloc among them, free among them, so that you can actually

  • manage your computer's memory.

  • Just to hammer this home, let's take a look

  • at just this small example that from the outset is buggy.

  • This code is buggy.

  • And we'll see in just a moment why.

  • But let's just walk through it step by step.

  • This first line highlighted in yellow in English

  • is doing what as you understand it now?

  • If you're a little rusty since last week, what's that first line of code

  • doing in English?

  • Anything?

  • Yeah.

  • AUDIENCE: It's creating a point to an int named x.

  • DAVID MALAN: Perfect.

  • It's creating a pointer to an integer and calling

  • that pointer or that variable x.

  • And let me propose that the next line is doing the same thing giving us

  • another pointer to an integer but this time calling it y.

  • The third line of code, someone else, what's

  • happening in English with this line here?

  • Yeah.

  • AUDIENCE: It's creates a memory of size of int and assigns it to x,

  • but it's more difficult to execute if it's not a pointer, I don't know.

  • DAVID MALAN: This is not the bug.

  • But the first part is correct.

  • malloc is the function we introduced last week

  • that allocates memory for you.

  • It takes one argument, the number of bytes that you want to allocate.

  • Even if you don't recall how many bytes you need for an integer,

  • you can call this other operator, sizeof,

  • that we saw briefly last week, that will return, in this case, 4 most likely

  • depending on the computer that you're on.

  • So this is saying, hey, computer give me 4 bytes of memory.

  • And it returns that chunk of memory to conventionally

  • by way of the first address, so ox something,

  • wherever those 4 bytes happen to star.

  • And then it stores that address in x, which is in fact OK,

  • because as you noted initially, x is in fact a pointer.

  • That is an address.

  • So all this is doing is it's declaring a variable called

  • x and storing in it ultimately the address

  • of a legitimate chunk of memory.

  • You wouldn't typically allocate an int like this.

  • You would allocate an int with just int and semicolon just like in week 1.

  • But now that we have the ability to allocate addresses and allocate memory,

  • you could achieve the same idea here.

  • This line here now, the fourth line of code, says what in English?

  • Star x equals 42 semicolon.

  • What's going on there?

  • Yeah.

  • AUDIENCE: It goes to the address in the x and then sets it to 42.

  • DAVID MALAN: Good goes to the address in x and sets it to 42.

  • So the star operator is the dereference operator,

  • which is a fancy way of saying go to that address.

  • And what do you want to do?

  • Well, per week 1 when we discussed the assignment operator,

  • it just says put the number 42 there.

  • So wherever malloc found 4 bytes of available memory for me,

  • this fourth line of code says, go there and put the number 42

  • by putting the appropriate zeros and ones.

  • This last line here--

  • and here's the bug, if we finally reveal it here--

  • this line is buggy.

  • Does anyone see why?

  • Yeah, over here.

  • AUDIENCE: You haven't allocated memory for that variable yet.

  • DAVID MALAN: Exactly.

  • I haven't allocated memory for that variable yet.

  • And because up here I've just said int star y semicolon.

  • You can only assume safely that has some garbage value, some unknown value,

  • maybe remnants from some other part of the program,

  • that might not necessarily be true here at the beginning of the program.

  • But for safety's sake assume that if you don't give a variable

  • a value, who knows what it contains.

  • It's got some bogus address, such that if you

  • say star y go to that bogus address something bad is going to happen.

  • And maybe you experienced this already in P Set 4 or prior,

  • some kind of memory problem with your code,

  • or a segmentation fault or seg fault, bad things

  • happen when you go to addresses that don't exist

  • or that you don't even know where they are.

  • So this line of code is bad.

  • But we can do a little better.

  • What if instead I do something like this?

  • I actually assign to y x.

  • So that just says put in y the same address that's in x.

  • And then with this last line of code, what if I now say star y equals 13?

  • What is that-- you're nodding your head.

  • What am I doing correctly now?

  • AUDIENCE: Now there's memory allocated for y.

  • DAVID MALAN: Good.

  • Now, there's memory allocated for y.

  • So you're saying go to that address and put 13 there.

  • However, what did we just need to the 42, just to be clear?

  • We clobbered it.

  • We overwrote it with the 13.

  • Because if x and y are the same address, both this says go to that address

  • and put 42 there, but then two lines later, we say, no, no, no,

  • go there and put 13 there instead.

  • But long story short, bad things happen when you don't necessarily

  • anticipate what is in memory and you don't allocate it yourself,

  • so thanks to one of our friends at Stanford,

  • allow us to take a moment here to hit Play on a short film, a claymation

  • if you will, that paints the same picture

  • in a more memorable way perhaps.

  • If we could dim the lights.

  • [VIDEO PLAYBACK]

  • NARRATOR: Hey, Binky, wake up.

  • It's time for pointer fun.

  • BINKY: What's that?

  • Learn about pointers?

  • Oh, goody.

  • NARRATOR: Well, to get started, I guess we're going to need a couple pointers.

  • BINKY: OK, this code allocates two pointers, which can point to integers.

  • NARRATOR: OK, well, I see the two pointers.

  • But they don't seem to be pointing to anything.

  • BINKY: That's right, initially pointers don't point to anything.

  • The things they point to are call pointees.

  • And setting them up is a separate step.

  • NARRATOR: Oh, right, right.

  • I knew that.

  • The pointees are separate.

  • So how do you allocate a pointee?

  • BINKY: OK, well, this code allocates a new integer pointee.

  • And this part sets x to point to it.

  • NARRATOR: Hey, that looks better.

  • So make it do something.

  • BINKY: OK, I'll dereference the pointer x to store the number

  • 42 into its pointee.

  • For this trick, I'll need my magic wand of y dereferencing.

  • NARRATOR: Your magic wand of dereferencing.

  • That-- that's great.

  • BINKY: This is what the code looks like.

  • I'll just set up the number.

  • And--

  • NARRATOR: Hey, look, there it goes.

  • So doing a dereference on x, follows the arrow

  • to access its pointee, in this case, the store 42 in there.

  • Hey, try using it to store the number 13 through the other pointer, y.

  • BINKY: OK.

  • I'll just go over here to y and get the number 13 set up

  • and then take the wand of dereferencing and just--

  • [BUZZER SOUND] oh.

  • NARRATOR: Oh, hey, that didn't work.

  • Say, Binky, I don't think dereferencing y is a good idea, because, you know,

  • setting up the pointee is a separate step.

  • And I don't think we ever did it.

  • BINKY: Oh, good point.

  • NARRATOR: Yeah, we allocated the pointer y.

  • But we never set it to point to a pointee.

  • BINKY: Mm, very observant.

  • NARRATOR: Hey, you're looking good there, Binky.

  • Can you fix it so that y points to the same pointee as x?

  • BINKY: Sure.

  • I'll use my magic wand of pointer assignment.

  • NARRATOR: Is that going to be a problem like before?

  • BINKY: No, this doesn't touch the pointees.

  • It just changes one pointer to point to the same thing as another.

  • NARRATOR: Oh, I see.

  • Now y points to the same place as x.

  • So, wait, now y is fixed.

  • It has a pointee.

  • So you can try the wand of dereferencing again to send the 13 over.

  • BINKY: Uh-- OK, here goes.

  • NARRATOR: Hey, look at that.

  • Now dereferencing works on y.

  • And because the pointers are sharing that one pointee, they both see the 13.

  • BINKY: Yeah, sharing, whatever.

  • So are we going to switch places now?

  • NARRATOR: Oh, look, we're out of time.

  • BINKY: But--

  • [END PLAYBACK]

  • DAVID MALAN: All right, so now that we do

  • have this power of pointers and addresses

  • where we have low level access to the computer's memory,

  • we can actually solve problems a lot more powerfully

  • and in a lot more interesting ways.

  • But first, let's motivate some of these problems.

  • So back in Week 2, we introduced arrays, which

  • was the first of our data structures, if you will.

  • Before then in Week 1, all we had was variables for things

  • like ints and chars and floats and so forth.

  • In Week 2, we introduced arrays, which meant

  • you could store two ints altogether or three or 10 or 100.

  • So you can kind of encapsulate lots of data together.

  • So unfortunately, though, arrays aren't quite as powerful as might be ideal.

  • So, for instance, if we have an array with size 3 and we actually want to go

  • ahead and store three values in it-- one, two, three--

  • suppose that we actually want to now store a fourth value,

  • but we didn't anticipate that from the get go.

  • Recall after all that with arrays you have to declare their size upfront.

  • So you've got to hard code the number 3 or a variable containing the number 3.

  • But suppose that we want to store the number 4.

  • You might think that, well, just give me another box

  • of memory just to the right of the number 3,

  • so that I can keep all of my numbers together.

  • But unfortunately, per last week, that's not really

  • a reliable assumption, because in the context

  • of the rest of your computer's memory, that 1, 2, 3,

  • might be here surrounded by other bytes.

  • And per last week those bytes might be mostly filled with other data

  • from some other parts of your program.

  • And yet you would think that in seeing that 1, 2, 3

  • is kind of painted into this corner, so to speak,

  • that there's just no room for the number 4,

  • and therefore you can't add the fourth number to your array,

  • is there a solution visibly to this problem nonetheless?

  • Where else could we put it?

  • Yeah.

  • AUDIENCE: Move it to off of other memory.

  • DAVID MALAN: Say that a little louder.

  • AUDIENCE: We can move it off to other memory.

  • DAVID MALAN: Yeah, so maybe we can move it off to other memory.

  • So there's a lot of EMMAs in my memory per last week, but there is still,

  • it would seem, based on this picture, some unused memory.

  • So maybe we could resize our array, grow it, not by just moving all of the EMMAs

  • because frankly that would seem to take a lot of time

  • if we had to shift all of these characters,

  • why don't we just relocate the 1, 2, 3 down here,

  • and that gives us an extra space for at least a number 4.

  • So indeed even if you're using arrays, you

  • can achieve this outcome by actually moving memory around.

  • But consider what's involved in that.

  • So if you've got our old array at top left,

  • and we've got our new array at bottom right, that is of size 4.

  • So we have plenty of room.

  • How do we go about resizing the array?

  • Well, it's kind of an illusion.

  • You can't just resize the array when we have all of these EMMAs surrounding us.

  • Instead, we actually have to move the array or copy it.

  • So the 1 gets moved to the new memory.

  • The 2 gets moved to the new memory.

  • The 3 gets moved to the new memory.

  • And then at that point, we can just throw away or free the previously used

  • memory and now go ahead and add our 4.

  • Unfortunately, this isn't necessarily the best strategy, right,

  • because if these three lockers represent our original memory and these four

  • lockers represents our new memory and they're deliberately far apart,

  • that is to say that if I want to go ahead and move like these same numbers,

  • I really have to do something like this, which involves quite a few steps.

  • Let me go ahead and put the 1 in there now.

  • Now, let me go ahead and get the 2 here.

  • And then I can go ahead and put this in here.

  • So now I've got the 2.

  • And then lastly, I can go grab the 3.

  • And so even though I did this pretty quickly on the screen,

  • the reality is there's a decent amount of work to do.

  • And then I still, of course, have to go ahead and add the 4 to the mix, which

  • is to say that I've taken figuratively and physically quite a few steps

  • in order to resize an array from size 3 to size 4, which

  • is to say if we now consider the efficiency

  • or, if you will, inefficiency of that algorithm, what kind of running time

  • is involved when inserting additional numbers into an array

  • as I've done here?

  • Here's our menu of options from a couple of weeks

  • ago when we focused on algorithms.

  • What's the running time of insertion into an array

  • based even on that simple demonstration would you say?

  • What's the running time?

  • Yeah.

  • AUDIENCE: O n squared.

  • DAVID MALAN: Say it again.

  • AUDIENCE: O n squared.

  • DAVID MALAN: O n squared.

  • So maybe O n squared in that there was a lot of back and forth

  • and we've seen that before.

  • We've seen bubble sort and selection sort add up.

  • It's not quite as bad as that.

  • It's not quite as bad as that.

  • Yeah.

  • AUDIENCE: O of n.

  • DAVID MALAN: O of n.

  • And why do you say O of n?

  • AUDIENCE: Because for as like as many lockers there are in the first one,

  • you have to increment the same amount of processes to insert them.

  • DAVID MALAN: Exactly.

  • Whatever number of lockers you have here-- so that's three specifically--

  • but n more generally, it's going to take me n steps

  • to transfer those numbers over here.

  • Or technically, it's going to take me 3-- maybe if I go back and forth,

  • it's like 6 steps.

  • But it's some multiple of n.

  • So it's not n squared.

  • That's when we kept iterating again and again and again.

  • This time I just have to move 3 numbers to here and then add the fourth number.

  • So it's indeed, Big O of n when you want to go ahead and insert or search

  • equivalently an array that's actually implemented--

  • sorry, insert is going to take us linear time.

  • But search recall-- and this was the powerful thing--

  • what's the running time of search so long as you keep your number sorted?

  • Per two weeks ago, that was logarithmic.

  • So we haven't necessarily sacrificed that.

  • And that's the appeal of storing our data in an array that's sorted.

  • You can use binary search.

  • However, this is expensive and moving things around isn't necessarily

  • the ideal approach.

  • So let's just consider what this might look like in code.

  • Let me go over to CS50 IDE here.

  • And let me go ahead and create a new file called list.c.

  • And let's see if we can't represent in code exactly this idea.

  • So let me go ahead and include for myself standard stdio.h

  • just so that we can print out some values ultimately.

  • Let me go ahead then and declare main--

  • int main void.

  • And then down here, let's just arbitrarily start

  • where we did with three integers, called list and size 3.

  • So I'm just mimicking exactly where we started pictorially

  • by having an array that was fixed at size 3.

  • And then if I went ahead and initialized that list,

  • I could just hard code-- that is type into the program itself--

  • those three values into bracket 0, 1, and 2 the numbers 1, 2, 3 respectively.

  • So I'm just manually initializing that array to three values.

  • And then just so that this program has some purpose in life,

  • let me go ahead and do int i equals 0, i less than 3, i++.

  • And then, let's just print out these elements just for good measure.

  • Each of them is an int.

  • So we'll use %i.

  • And then I'm going to go ahead and print out list bracket i.

  • So kind of a Week 2 style program, where All I'm doing

  • is hard coding an array of size 3, initializing it

  • with three values, 1, 2, 3; 0 indexed, and then printing them out.

  • So if I go ahead and save this and make my list and then go ahead and compile

  • this with ./list, I should see hopefully 1, 2, 3.

  • But there's a problem with this implementation fundamentally because I

  • have hardcoded-- that is typed explicitly--

  • the size of this array, how can I go about adding a fourth element?

  • What would I have to do?

  • Well, I could change the code up here to 4.

  • And then I could add another line here.

  • And then I could change this.

  • But then I have to recompile it.

  • And so it's certainly not dynamic.

  • But we did see a function last week that lets

  • you allocate more memory dynamically.

  • And just to be sure what was that function?

  • So malloc.

  • Right?

  • Now that we have malloc, you don't have to type into your program source code

  • from the get go a fixed number.

  • You can actually allocate some amount of memory dynamically.

  • Now, here just for demonstration's sake, we'll do it to achieve the same goal,

  • but in a way that's going to scale a little more effectively.

  • Recall from last week that if you want to get a chunk of memory from malloc,

  • it's going to return the address of that chunk of memory.

  • So that suggests that I can declare a pointer to an integer called list.

  • And then let me go ahead and allocate, how about, three integers initially

  • times whatever the size is of an integer.

  • So this is a little weird looking, but consider what this is doing.

  • malloc is being asked for 3 times the size of an int.

  • So give me enough memory to fit three integers.

  • By definition that returns a pointer, per last week.

  • So we have to assign it to a pointer on the left.

  • So list is a variable now, just like x and y from our previous example,

  • that's storing the address of that chunk of memory.

  • But what's cool about C is that now that you

  • know that list is a chunk of memory, we can actually

  • borrow that same square bracket notation from Week 2.

  • And this code here doesn't actually need to change.

  • If you use square bracket notation next to the name of a pointer, what's

  • going to happen for you automatically is the computer

  • is going to go to the first byte in that chunk of memory.

  • This index is going to go to the next chunk of memory.

  • This is going to go to the next chunk of memory,

  • all within the scope of what malloc returned for you.

  • And just as an aside, how many bytes are in an integer?

  • AUDIENCE: 4.

  • DAVID MALAN: 4.

  • And recall I briefly mentioned the expression last week

  • pointer arithmetic.

  • What you're also getting sort of magically

  • with this square bracket notation is that bracket 0,

  • it happens to be byte 0.

  • Bracket 1 is not the second byte.

  • It's actually 4 bytes over.

  • And bracket 2 is not the third byte.

  • It's actually 8 bytes over, because you allocated 4 plus 4 plus 4, 12 bytes.

  • And so this square bracket notation just jumps you

  • to the right place in that chunk of memory, so that you can fit int,

  • int, int.

  • Yeah.

  • AUDIENCE: Why do you allocate a pointer to an int rather than an pointer

  • to an int array?

  • DAVID MALAN: Why do you allocate a pointer to an int and not a pointer

  • to an int array?

  • In this context, arrays and pointers are in some sense the same.

  • A pointer is an address of memory.

  • An array is just a chunk of memory.

  • And so even though we used chunks of memory in Week 2

  • by just calling them arrays, they really are just more

  • generally chunks of memory that support square bracket notation.

  • But now that we can allocate as much memory as we want,

  • we can kind of use these two concepts interchangeably.

  • And there are some subtleties in C. But this now has the same effect as Week 2.

  • And this is the only new line from this week.

  • But now if you're using malloc, even though I'm not

  • going to do it in a more complicated program here,

  • you can imagine now the code running in a loop and maybe allocating more memory

  • and more memory and more memory when you need it, because malloc

  • allows you to do just that.

  • And we do need to do a couple of safety checks here.

  • It turns out, per last week, that malloc can sometimes run out of memory.

  • If you're Mac or PC or the cloud runs out of memory for your account,

  • well, you might want to check the return value.

  • And so good practice would be, well, wait a minute,

  • if list equals equals null, let me just go ahead and return 1,

  • something went wrong, because my computer is out of memory

  • for some reason.

  • So best practice would say anytime you allocate memory,

  • always check if you've gotten back null.

  • Now, let me just do something for the sake of demonstration.

  • Let me move my window down here.

  • Let me highlight these lines of code and just make

  • the claim that highlighted here between lines 5 and 13

  • are lines of code that simply allocate a list of size 3

  • and store three values in it.

  • That's the story where we left off a moment ago.

  • Suppose now I change my mind and decide after line 13

  • or maybe elsewhere in this program if it were larger,

  • you know what, I actually want another integer.

  • I want to resize that array.

  • Well, how can I go about doing it?

  • Well, let me go ahead and do this.

  • Let me go ahead and allocate, for instance, another address

  • and say store at that address a chunk of memory corresponding to four integers

  • using the size of operator as before.

  • So temporarily, let me go ahead and give myself

  • a new chunk of memory that is big enough to fit

  • four integers instead of just three.

  • Let me practice best practices and say, you know what, just in case,

  • if temp equals equals null because I'm out of memory, forget it,

  • I'm done with the program.

  • We're not going to proceed anyway.

  • But that's just good practice now.

  • But now what I want to do?

  • If I now have two chunks of memory, this one is a size 3, this one is of size 4,

  • what did we do the last time we wanted to move something around

  • in the computer's memory, what did I physically do with the lockers?

  • I think you're nodding.

  • What did I do?

  • AUDIENCE: You went through each and move 1 to the--

  • DAVID MALAN: Yeah, exactly, I went through each one

  • and copied the value from left to right, from old to new.

  • And so let me go ahead and do exactly that.

  • I think I can do this with a for loop, for int i get 0, i is less than 3,

  • because that's the size of the old array that size 3, i++.

  • And then in this iteration, I can just do something like this--

  • use this new chunk of memory just like an array,

  • because I claimed I can use my square bracket notation and store location

  • i whatever is in the original list at location i.

  • So this code here now, if I were to comment it,

  • copy integers from old array into new array.

  • And that too is just using a for loop from old to new.

  • But now that's not quite everything I want to do.

  • I also want to store at the location 3, 0 index,

  • which means it's the fourth location, another value, number 4.

  • That's why I put the additional number 4 into those lockers.

  • So now with these lines of code, I have implemented the physical notion

  • of copying all of the values from the old array into the new array.

  • So I'm almost done, except what did we learn

  • last week that you should do whenever you're done with a chunk of memory?

  • Do I still need the original chunk of memory?

  • AUDIENCE: No.

  • DAVID MALAN: No.

  • And how do I give it back to the computer?

  • AUDIENCE: Free.

  • DAVID MALAN: Free.

  • So quite simply, I literally just call free,

  • passing in the address of the chunk of memory that I want to free.

  • And even though I'm passing in one address,

  • the computer is going to do the heavy, lifting of remembering

  • how many bytes I asked for originally.

  • You don't have to worry about that.

  • You just say, whatever this is pointing at, go ahead and free it all.

  • So now, you know what, now that I've gotten rid of that list,

  • I'm going to update list equal temp, which is just cleaning up the naming.

  • Temp is kind of a stupid name for a list.

  • Let me just reuse the original pointer and let list equal temp.

  • And now down here if I've done everything correctly,

  • it should suffice to print out that whole list.

  • So let me save this.

  • Let me give myself a bigger terminal window.

  • Do make list again.

  • A lot of mistakes.

  • Let's see, first one is up here.

  • Implicitly declaring library function malloc dot dot dot.

  • What generally is the solution when you see implicitly declaring something?

  • AUDIENCE: Header file.

  • DAVID MALAN: Header file, which one do I want?

  • Do you recall?

  • This is subtle and we might not have used it last time if I used

  • the CS50 library, but it's stdlib.h.

  • That is where malloc is.

  • That is where free is.

  • So stdlib.h is one new header file that contains last week's functions.

  • So let me try this again.

  • Let me make list.

  • Nice, this time it did compile.

  • ./list and-- hm, I seem to be missing that fourth number.

  • But I think this is just a stupid mistake on my part.

  • What did I do wrong if I look at the printing of this array?

  • AUDIENCE: Size of list.

  • DAVID MALAN: Yeah, down here the new list is size 4.

  • So frankly, if you recall a few weeks ago,

  • I encouraged you, don't just hard code numbers, magic numbers,

  • in your programs.

  • We should really be using constants or some other variable.

  • This is exactly why, because you, just like me,

  • might overlook a detail like that.

  • So let me recompile it.

  • And let me do list.

  • And, voila, there is my 1, 2, 3, 4.

  • Now, to be clear, this is kind of a stupid program,

  • because I sort of decided halfway through writing

  • this program, wait a minute, I want four integers, not three.

  • And, of course, at that point, you should just

  • delete the earlier code you wrote.

  • So this is just for demonstration sake.

  • If you imagine this being a bigger program, that just over time

  • the human decides maybe because get int is called that they need more memory,

  • this is how you would do it using malloc.

  • But it turns out there's a function that can actually

  • make our lives a little easier here.

  • So let me go ahead and clean this up just a little bit.

  • It turns out that I don't have to allocate more memory myself,

  • copy all of these things myself, and then also free it.

  • I can consolidate a bunch of these lines as follows.

  • Instead of using malloc, I can actually say realloc,

  • which as the name suggests, reallocate a chunk of memory.

  • What do you want to reallocate?

  • Well, I want to reallocate the list.

  • And this time I want to do 4 times size of an int.

  • I'm going to store this in temp temporarily.

  • I'm going to make sure that nothing went wrong,

  • as by checking for null, which just means, hey, you might be out of memory.

  • And then I'm going to return if so.

  • But down here, if all is well, I'm going to go ahead and do this.

  • And watch this I now have simplified my code by quite a few lines.

  • realloc, by definition-- this is another function in stdlib.h--

  • handles the process of taking an existing chunk of memory that you

  • already asked for, resizes it to be this new size,

  • whether it's bigger or smaller.

  • It handles the copying of the data from old to new to you.

  • And you just need to check that nothing went wrong

  • as by checking for null here and then remembering the new value.

  • So this code now, which is just six lines of code,

  • previously was more than that.

  • And it's just a handy function to use.

  • All right, a question from earlier.

  • AUDIENCE: Why can't we just create this to the temp in the beginning,

  • because if we equate this to the temp, then we equate this to the pointer

  • perhaps, so this to the point to the 4 bytes of memory?

  • DAVID MALAN: Really good question.

  • So let me roll this back by rewinding.

  • And all of the finished versions are on the course's website

  • if you want to play with them later.

  • This was the previous version using just malloc.

  • If you just do this, update a new chunk of memory, as I think you're asking,

  • what's happening is you are effectively orphaning the old chunk of memory.

  • Because if you change what's stored in list

  • and have it store the new chunk of memory, where'd the old chunk of memory

  • go?

  • It's sort of floating there in your computer's memory.

  • But you've lost all pointers to it.

  • There's no arrow anymore pointing to it conceptually.

  • So that's why we have to jump through these hoops of having

  • a temporary variable just so that we don't

  • lose track of things we've allocated.

  • And we'll see this later today with another data structure as well.

  • Yeah.

  • AUDIENCE: Somebody asked this, but I don't

  • understand that if you initialize temp as a pointer toward integer,

  • then does it not create problems that you use it as an array.

  • DAVID MALAN: Good question.

  • If you initialize temp as a pointer to an integer,

  • does it not create problems that you're using it as an array?

  • Short answer, no, because again an array by definition from Week 2

  • is just a chunk of memory.

  • And in C you can use the square bracket notation

  • to jump to random parts of that memory using simple arithmetic, bracket 0, 1,

  • 2, and so forth.

  • Last week when we introduced malloc and free and now realloc,

  • you now have a more low level way of allocating as much memory as you want.

  • So it's a more powerful, general purpose mechanism.

  • But at the end of the day, you're still getting back a chunk of memory,

  • contiguous memory, bytes that are back to back to back.

  • So you can certainly still use the square bracket notation

  • because essentially an array is a chunk of memory.

  • And malloc gives you a chunk of memory, ergo you can treat it as an array.

  • They really are equivalent in that sense.

  • You just don't get as many user friendly features as with arrays,

  • like them being freed for you, as we never until last week

  • do we have to free chunks of memory.

  • Arrays do that for you automatically thanks to the compiler.

  • Yeah.

  • AUDIENCE: Do you not have to recreate the list for temp after line 37.

  • DAVID MALAN: Yes.

  • Thank you.

  • So there is a bug here.

  • And if I ran Valgrind on this code, I would see exactly

  • that, that I'm leaking some number of bytes.

  • So indeed, at the end of this program, I want to free the--

  • let's make sure-- yep, I want to free now

  • the new chunk of memory, which is a size 4, to avoid

  • exactly the problem you identified.

  • Good catch.

  • All right , so just for now, even if that's a lot of code,

  • let's consider now higher level takeaways from this,

  • just so that we can then motivates an alternative approach that allows us

  • to stitch together somewhat fancier data structures instead.

  • So in general, a data structure is just a programming construct in C, C++,

  • Java, Python-- we'll see them in different languages over the remainder

  • of the term--

  • that allow you to store information differently in your computer's memory.

  • In C, everything we're about to do today is thanks to these three features of C.

  • So even though this may feel like a lot of syntax, everything we do

  • today boils down to three pieces of syntax that you have seen before.

  • Struct, this recall was a keyword in C that allows

  • you to create your own structure.

  • For instance, a couple of weeks ago, we created a person structure,

  • who had a name and a number.

  • And that gives us our own data type that is structured to contain two values,

  • like name and number.

  • You use structures as well for the problem

  • set involving bitmaps, the bitmap header and so forth.

  • Those were data structures as well.

  • What do we use the dot notation for just to be clear?

  • And you definitely use this when manipulating

  • red and green and blue pixels recently.

  • Yeah.

  • AUDIENCE: To access a property of a structure.

  • DAVID MALAN: Exactly.

  • To access a property of a structure.

  • So if you want to get at a person's name or get at a person's number,

  • you use the variable's name and then a dot operator

  • to get inside of that data structure.

  • So we've seen that before.

  • Then last week and again today, we see the star operator,

  • which can kind of mean different things in different contexts.

  • But it's always related here to memory as of last week.

  • This is the dereference operator that allows you to go to a chunk of memory

  • by way of this thing called a pointer.

  • So even if today feels like a bit of that fire hose--

  • and again per my email, this is where things now

  • begin to level off-- realize that it all boils down to first principles,

  • or if you will sort of three scratch-like puzzle pieces

  • that we're now going to assemble into more interesting solutions to problems.

  • So allow me to introduce something called a linked list.

  • A linked list, as we'll see is going to allow you to store a list of values.

  • Now an array allows you to store a list of values.

  • But what are some of the downsides with an array?

  • Well, an array is a fixed chunk of memory.

  • And if you want to resize that array to add more

  • values to it, what do you have to do?

  • Well, you minimally have to allocate more memory.

  • You need to copy all of those values from old to new.

  • And then you can go about your business.

  • Now, realloc is a function that makes that a little simpler.

  • But realloc is doing the exact same legwork

  • that I was doing between the lockers of copying value, freeing memory,

  • and so forth.

  • So it needs to be done.

  • And that's why insertion into an array is going to be big O of n,

  • because it might take you that much time to copy

  • the whole array into a new space.

  • So that feels kind of suboptimal, right?

  • Arrays can be slow in that sense.

  • But what was the appeal of an array?

  • Like what's good about arrays?

  • Because we don't want to abandon them entirely.

  • Yeah.

  • AUDIENCE: You can index into them really easily.

  • DAVID MALAN: You can index into them really easily, not only syntactically

  • with the square brackets, but you have constant time access--

  • this is known as random access.

  • And it's not random in the sense that you just end up who knows where.

  • You can just jump to bracket 0 or 1 or 2 instantly.

  • It took me, the human, more time because I had to physically walk.

  • But a computer is going to be able to jump to 0, 1, 2, 3 instantly.

  • And so arrays are super fast.

  • And they lend themselves to things like binary search

  • as we've seen now for some time.

  • But what if we use the canvas that is our computer's

  • memory like a little more cleverly?

  • We don't have to just plop things next to each other, next to each other, next

  • to each other, and then hope for the best hope

  • that there's still more memory back to back to back.

  • What if we instead are a bit more clever about it?

  • And suppose we want to store the number 1.

  • And that happens to be an address 0x123.

  • It's arbitrary.

  • But recall from last week that every byte of memory in your computer

  • is stored somewhere.

  • So let's propose that 1 is stored at 0x123.

  • Suppose now that this represents an array of size 1

  • and you want to add a second value to this array.

  • Or let's start calling things more generally a list.

  • A list like in the real world is just a list of values.

  • This list is of size 1.

  • Now maybe there's a lot of EMMAs in this memory that are getting in the way.

  • But suppose that there is some free space a little lower in your computer's

  • memory over there.

  • So it's not here.

  • It's not here.

  • It's not available here or here or here.

  • There's other stuff there.

  • But suppose the computer does have some available memory over here

  • in which you can store the number two, just because.

  • And that address happens to be 456.

  • Finally, you want to store third value.

  • And it turns out that the nearest possible location is down here,

  • number 3.

  • That happens to be at address 0x789.

  • So this is not an array by definition, because the 1, the 2, the 3

  • are not contiguous back to back to back.

  • You cannot square bracket notation here because square bracket notation

  • requires, per Week 2, that all of your values be next to each other,

  • just like the lockers here.

  • This picture, where 1 is over here, 2 is over here, 3 is over here is more like,

  • oh, maybe this is 0x123.

  • Maybe this is 0x456.

  • Maybe this is 0x789.

  • They're kind of all over the place.

  • And that's just because that's what's available in your computer's memory.

  • But what if I get a little extravagant and I

  • start to use, not just one chunk of memory

  • to store each value, like 1, 2, 3, what if I go ahead

  • and give myself twice as much memory just to give myself some flexibility?

  • So I now conceptual use this chunk of memory to represent one.

  • This junk to represent 2, this chunk to represent 3.

  • But you know what I'm going to use the latter half of each of those chunks

  • for?

  • Any thoughts?

  • AUDIENCE: Address to the next.

  • DAVID MALAN: An address to the next chunk of memory.

  • So, for instance, if my goal is to keep this list sorted,

  • so I want conceptually to have a list that stores 1, 2, 3,

  • why don't I use this as sort of a map or a breadcrumb, if you will,

  • that points to the next chunk of memory?

  • And why don't I use this chunk of memory to point at the next one?

  • And then this chunk of memory, you know what, I just need a special value here.

  • What would be a good arbitrary way to say, mm, mm,

  • there's nothing more in the list?

  • AUDIENCE: Null.

  • DAVID MALAN: It's something called null.

  • And this technically is different from backslash 0, which is a char.

  • This is something called--

  • well, this is in hexadecimal 0.

  • Now, starting today-- and we saw the super briefly last week--

  • this is n-u-l-l with two L's-- this was stupid left hand not really talking

  • to right hand-- n-u-l is backslash 0, which is a char.

  • n-u-l-l is a pointer.

  • But they both equal 0 underneath the hood.

  • So you just store special value that says that's it for the list.

  • Now, we last week I proposed who really cares where things are in memory?

  • So indeed, let's do that again.

  • Let's just use pointers drawn as arrows in this artist's rendition

  • to say this list of numbers, 1 2, 3, is now linked.

  • A linked list is just a data structure containing multiple chunks of memory

  • that are somehow linked together.

  • And if underneath the hood, so to speak, they're just linked together

  • by way of pointers, and the price we pay is

  • that rather than now in a linked list storing just the numbers 1, 2, 3, which

  • we could have in an array, now you have to store twice as much information, 1,

  • 2, 3, as well as three pointers, two of which are in use, the other of which

  • is ready to go if I want to add something to this list.

  • So this is to say we can now create structures

  • that look like this in the computer's memory

  • just by using this new feature of pointers.

  • What might these structures look like individually?

  • Well, any one of these numbers has two fields it seems.

  • One is an integer.

  • We'll call it number.

  • And then there's another field here.

  • That let's call it next by convention, but we could call it anything we want.

  • It's just another chunk of memory that's pointing

  • to the next element in the list.

  • Well a couple of weeks ago, we introduced persons.

  • And a person had a name and a number.

  • That's not relevant today, because we're not dealing with names and numbers.

  • We're just dealing with integers.

  • So let me propose that we back that up and still use the same syntax

  • as a couple of weeks ago.

  • But instead of defining a person, let's call this rectangle a node.

  • So this is a term of art in computer science node-- n-o-d-e--

  • just represents this rectangular concept, a chunk of memory

  • that you're using for interesting purposes.

  • It's sort of a node in a graph if familiar from math.

  • But what do I want this know to store?

  • Well, let me go ahead and store a couple of things in it.

  • One, a number, and that's just going to be an int.

  • And I'm going to go ahead and call it number.

  • And then any guesses as to what the second field should be declared as?

  • I want to call it next just because it's conventional.

  • What should its data type be?

  • Any thoughts?

  • Yeah in back.

  • AUDIENCE: A pointer.

  • DAVID MALAN: A pointer.

  • And a pointer to what would you say?

  • AUDIENCE: The next number.

  • DAVID MALAN: A pointer to the next number, and not quite the next number

  • per se, because now we have not numbers only, we now have nodes.

  • So those three yellow boxes, 1 2, 3, those are now nodes, I would say.

  • So you know what?

  • Let's go ahead and call this node star.

  • But you can't technically quite do this.

  • It turns out that C, recall, takes you super literally.

  • And notice, if you read this code top to bottom,

  • left to right, at which point in the story

  • does the word node come into existence?

  • Like not until the very last line.

  • That's where we mentioned person.

  • This is where we mentioned node.

  • So, unfortunately, you can't use a node inside of a node,

  • because it literally doesn't exist in the computer's mind

  • until two lines later.

  • So there's an alternative here.

  • There is a solution.

  • It's a little more verbose.

  • Instead of just saying typedef struct, you actually say typedef struct node.

  • Just add the word that you want to use.

  • And now down here, you can say struct node star next.

  • It's kind of like a work around.

  • This is the way C works.

  • But this is the exact same idea.

  • This means that any node in the data structure, any yellow rectangular box

  • has a number and a pointer.

  • And that pointer by definition is a pointer to a node structure.

  • We just have to express it more verbosely

  • here, because the shorthand notation node doesn't exist until the bottom.

  • It's just sort of an annoying reality of the syntax.

  • All right, any questions on that definition of struct?

  • Yeah.

  • AUDIENCE: Do you have to put node twice?

  • DAVID MALAN: Do you have to put node twice?

  • You don't have to put node twice.

  • You can actually use any word here you want.

  • You can call this x or y or z.

  • But then you're going to have to make this be struct x or struct y

  • or struct z.

  • By convention, I would just reuse the same term.

  • So this is the formal name for this structure.

  • It is a struct node.

  • This is the nickname for the structure, more succinctly, node.

  • And that's what typedef does.

  • It gives you an alias from struct node to just node,

  • just because it's easier to type elsewhere in the program.

  • Other questions?

  • All right, so what can we do now with this structure?

  • Well, let's go ahead and build something up here.

  • All right, so this is about as scary as the code gets today.

  • We'll focus primarily on pictures and concepts hereafter.

  • But let's take a tour through one implementation

  • of this same idea of a linked list.

  • How would we go about representing a linked list initially?

  • Well, initially the list is empty.

  • And if you want to represent something that's empty,

  • we minimally need something.

  • So let me draw it as this empty box.

  • And this is just a pointer to a node, I claim.

  • So how do I implement the notion of a linked list that has no numbers in it

  • yet?

  • Well, why don't we just use this, which I can implement as follows.

  • node star, and I'm going to call it list, but then set it equal to NULL.

  • Right, if there's no numbers available-- there's no 1, there's no 2,

  • there's no three--

  • I should at least have a variable that connotes there is no list.

  • And the easiest way to do that is in the absence of a value, store 0,

  • which has this new nickname as of last week and this, called null.

  • So this variable here represents this picture here.

  • And notice, there's no numbers, because the list is empty.

  • But we do initialize it to NULL so that we

  • don't think that there's an arrow pointing

  • to some specific chunk of memory yet.

  • Because there isn't yet.

  • Now, suppose I want to go ahead and insert a number into this list.

  • Suppose I want to insert the number 2.

  • I can't just allocate space for 2 now.

  • I have to allocate space for 2 and that pointer,

  • otherwise known together as a node, per the previous slide.

  • So how do I go about doing this?

  • Well, in code, I can borrow the same technique

  • that we've used a couple times now, even though it's uglier

  • than some past approaches, malloc then an integer.

  • How many bytes do you want?

  • I don't know how big a node is.

  • I could probably do the math and add up the integer and then the pointer.

  • But, you know what, size of node is just going to answer that question for me.

  • So this returns that chunk of memory that's big enough to store a node.

  • And I'm going to store that just for temporarily

  • in a variable called n, n for node, and that's going to just

  • be a temporary variable, if you will.

  • So, again, even though there's some new stuff going on here,

  • this is just like before.

  • Previously, I wanted to allocate an integer.

  • Now, I want more than an integer.

  • I want an actual node.

  • And malloc returns an address, which means I must assign it to a variable.

  • That's an address on the left hand side.

  • All right, what should I always do?

  • Slight spoiler because I clicked ahead a moment ago--

  • actually, we're going to forge ahead here.

  • This is the ugliest thing we'll see.

  • What is this second line of code doing here?

  • What's going on here do you think?

  • Yeah, what do you think?

  • AUDIENCE: It's setting the number of that node to 2.

  • DAVID MALAN: It is.

  • It's setting the number of that node to 2.

  • But why this crazy syntax, which we've never used before?

  • Well, star n, we did see last week.

  • That just means go there.

  • The parentheses are just necessary for order of operations

  • so that the compiler knows, OK, first go there.

  • And then once you're there, what do you want to get access to?

  • The number field.

  • So use the same dot notation.

  • So it's super ugly.

  • But it's just doing two different things that we've seen in isolation.

  • Go to the address in n, which is that chunk of memory.

  • And then access the number field and set it equal to 2.

  • Fortunately, C has some syntactic sugar, just an easier, prettier way

  • of doing this.

  • And it happens to look wonderfully like the actual thing we keep drawing--

  • this arrow notation.

  • So if you ever see and you ever write this notation in C--

  • and I'm pretty sure this is the last new syntax we'll see--

  • this arrow, this very sort of hackish era

  • where you hit a hyphen and then a greater than sign,

  • this means the exact same thing as this.

  • This is just annoying to type.

  • It's ugly to look at.

  • This is just slightly more pretty.

  • And frankly, it's reminiscent of the pictures

  • we've been drawing with the arrows pointing left and right.

  • What's the next thing I want to do?

  • After allocating this new node for the number 2,

  • what do I want to put as well in that node?

  • AUDIENCE: Put in the address.

  • DAVID MALAN: Sorry, a little louder.

  • AUDIENCE: The next address.

  • DAVID MALAN: The address of the next node.

  • But there is no next node yet.

  • So what value could I use as a placeholder?

  • AUDIENCE: Null.

  • DAVID MALAN: Null.

  • And so indeed, I'm going to do this arrow notation as well.

  • You never need to do the star and then the dots and the parentheses.

  • Everyone just writes the code like this in the real world.

  • So n arrow next gets null.

  • That now gives me that picture we were drawing.

  • But, again, sanity check, if you ever use malloc,

  • you should always check the return value.

  • So just to be super precise, let me go ahead

  • and add a couple more lines of code that just check if n is not null,

  • go ahead and do the following.

  • Conversely, I could check if n is null and then just exit or return

  • depending on where I'm using this code.

  • But you don't want to touch n and use this arrow notation

  • unless you're sure n is not null.

  • So what have I just done?

  • My picture now looks like this.

  • But this, of course, is not a linked list,

  • because there's no linkage going on.

  • I really need to do the equivalent of pointing an arrow

  • from this pointer to this structure.

  • I need to implement an arrow that looks like this.

  • So how can we go about implementing that in code?

  • Well, let me propose that this is what it ultimately looks like.

  • We just need to draw that arrow.

  • How do I do that?

  • Well, it's as simple as this.

  • If list is a variable, and it's previously initialized to null--

  • it's just a place holder--

  • and n is my temporary variable storing the new node, it suffices to say,

  • well, lists should not be null anymore.

  • It should literally equal the address of that chunk of memory I just allocated.

  • And that's how we get this picture now inside of the computer.

  • Now, let me do a couple of more operations.

  • Suppose I want to add to the list the number 4.

  • How do I add the number 4?

  • Well, the number 4 is inside of its own node.

  • So I have to go back to code like this.

  • I need to allocate another node that installs the number 4 there.

  • But that's not all.

  • You don't want to just create the node, because it's

  • otherwise out there in no man's land, so to speak.

  • We now need to add the arrow.

  • But now it gets a little non-obvious how you update the arrows,

  • right, because I don't want to update list

  • to point at 4, because that's going to sort of orphan, so to speak, number 2.

  • And it just kind of float away conceptually.

  • I really want to update 2's pointer to 4.

  • So how can I do that?

  • Well, you know what I can do is I can kind of follow these breadcrumbs.

  • If I declare a temporary pointer--

  • and I'll do it using a little extravagantly last week

  • like this little pointer notation-- if I'm a variable called temp, TMP,

  • I can go ahead and point at the same thing that list is pointing at.

  • And I'm going to check is this next value null?

  • If it is, I found the end of the list.

  • So really I can follow that arrow.

  • Now, I know that I'm at a null pointer.

  • So now, I just want to draw this number up here.

  • And I accidentally advanced the screen.

  • I want to actually draw this arrow up here.

  • So how do we go about doing that?

  • Well, the code there might look like this.

  • So if all I want to point at a node, as I just did with the big fuzzy hand,

  • I can just initialize this pointer to equal whatever the list itself

  • is pointing at.

  • Then, I can do like a while loop.

  • And it's a little weird looking, because I'm using some of my new syntax.

  • But this is just asking the question, while the next field I'm pointing at

  • is not NULL, go ahead and follow it.

  • So again, this is as complicated as the syntax today will get.

  • But this is just saying, whatever I'm pointing at point specifically

  • at the next field.

  • While it is not NULL, go ahead and update yourself

  • to point at whatever it is pointing at.

  • So if I advance to the next slide here, this

  • is like I'm initially pointing at 2.

  • I see an arrow.

  • I'm going to follow that arrow.

  • I'm going to follow that arrow.

  • So however big the list is I just keep moving my temporary pointer to follow

  • these breadcrumbs until I hit NULL.

  • So here in the story let me propose that we add another number, 5.

  • And 5, of course, if we keep it sorted, it's got to go over here.

  • And again, they're all over my computer's memory.

  • They're not in a perfectly straight line,

  • because who knows where there's available space.

  • But now that I found this, I want to go ahead and create one more pointer using

  • code very similar to what we just saw.

  • But now lastly, let's do one more here at the beginning

  • and then one more in the middle and see what can go wrong.

  • What is worrisome about 1 if we actually want

  • to store this list in sorted order?

  • What might I be mindful of now if the goal

  • is to insert 1 into this linked list?

  • Any thoughts?

  • What do I want to do first?

  • Well, you know what, let me go ahead and just

  • point-- you know what, it's obviously got to go to the start of the list

  • if I want to keep it sorted, so that the arrows eventually

  • go from left to right.

  • So let me go ahead and just use code like this to allocate the new node.

  • And let me go ahead and just move that arrow like this.

  • This is wrong even though we've not seen the code for it.

  • But why is this wrong?

  • Yeah.

  • AUDIENCE: You're orphaning 2, 4, and 5.

  • DAVID MALAN: I'm orphaning 2, 4, 5.

  • In what sense?

  • I mean literally in my program, the only variables and the variables

  • I have are those you see on the board here.

  • So if nothing is pointing at 2 anymore, it

  • doesn't matter that 2 is pointing at 4 and 4 is pointing at 5,

  • we have orphaned 2 and transitively 4 and 5.

  • So those are just lost.

  • That is a memory leak.

  • If you recall using Valgrind and getting yelled at by Valgrind

  • because you're leaking memory, it might be because, yes, you've

  • forgotten to free memory.

  • Or worse, you might have completely forgotten where

  • the memory is that you were using.

  • And by definition of your own code, you can never access that memory again.

  • You've asked the computer for it, but you're never able to give it back

  • because you have no variable remembering where it is.

  • So we don't want to do that.

  • We instead want to do this probably.

  • Let's point 1 to 2 first, which is kind of redundant, right?

  • Now, we have sort of conflicting beginnings of the list.

  • But once 1 is pointing to 2, what can your next update?

  • AUDIENCE: The list.

  • DAVID MALAN: List to point at 1.

  • And you can do this in code if you'd like really with just two steps.

  • You can update the next field of your new node, which

  • is the one representing 1 that I just allocated,

  • and you can initialize it to point at whatever list is pointing at.

  • So if you want this thing to point at the same thing

  • that this thing was pointing at, you literally just

  • say in code n arrow x equals whatever list is pointing at

  • and then you say the list should equal n itself.

  • And again, you'll see in section this week

  • and in the upcoming problem set actual opportunities

  • to apply these kinds of lines of code.

  • But those are the kinds of thought processes

  • that you should be mindful of.

  • Now, 3 is the only one that's particularly annoying.

  • And we won't look at the code for this.

  • If we actually want to put something in sorted order in the middle of the list,

  • let's just consider conceptually what's got to happen.

  • We've got to allocate memory for the node.

  • We then need to update what?

  • We probably don't want to point 2 at 3 for the exact same reason

  • you identified.

  • We then orphan 4 and 5.

  • So what should we update first conceptually?

  • AUDIENCE: 3 to 4.

  • DAVID MALAN: Update 3 to 4, so it is going to look like this.

  • And now we can update 2 to 3.

  • And I'm going to wave my hand at the code for this

  • only because there's multiple steps now.

  • You have to probably have some kind of loop

  • that iterates over the existing list, finds the appropriate location using

  • less than or greater than, trying to find the right spot.

  • And then you have to manipulate the pointers to do that.

  • You won't need to do something as complicated

  • as that for the upcoming problem set 5.

  • But it is just boiling down to some loops, some inequality checks, and then

  • some updates of the pointers.

  • But it's easier generally to add stuff at the end

  • and even easier to add things at the beginning,

  • especially if you don't care about maintaining any kind of sorted order.

  • Phew.

  • Any questions on that?

  • Yeah.

  • AUDIENCE: Back to the beginning, like the code you had,

  • what's the difference between node with star and like a pointer n of type node?

  • DAVID MALAN: A pointer n of type-- let me just scroll back to the code, here?

  • AUDIENCE: Yeah.

  • DAVID MALAN: OK, so this is malloc is going

  • to give us a chunk of memory that's big enough to store node.

  • Node star n gives us a pointer that is the address of a node.

  • And therefore we're going to assign the return value of malloc

  • to that variable, so that n effectively represents a chunk of memory

  • that's big enough to store a node.

  • AUDIENCE: So n is node, not a pointer?

  • DAVID MALAN: n is a pointer to a node.

  • n is a node star, or a pointer to a node.

  • And what does that mean?

  • n is the address of a node.

  • And that should make sense, because malloc returns an address.

  • But this is why we're now using arrow notation.

  • n is not a node.

  • You can't do n dot number and n dot next.

  • You have to do the star thing and then the dot.

  • Or more succinctly now, you do an arrow number and arrow next.

  • Good question.

  • All right, let's see if we can't make this a little more real with maybe one

  • demonstration here.

  • Let me go ahead and put on the screen the end point that we want to get to,

  • which is that here with everything in order,

  • I think for this we need maybe five volunteers if we could.

  • Let me go a little farther in back.

  • OK, one over there.

  • Maybe two over there.

  • Now the hands are--

  • OK, 3 over here if we can go in back up over there.

  • OK, 4 being volunteered by your friend.

  • And 5 being volunteered by your friends.

  • Do you want to come on up?

  • All right, come on up.

  • Come on up.

  • Brian's going to help me run this demonstration.

  • If all five of you could come on over, come on over here

  • where we have some space.

  • All right, let me get you some microphones and introductions.

  • OK, thank you.

  • Two of them were bravely volunteered by the people sitting next to them.

  • So props to both of you.

  • You want to say hello and a little something about yourself.

  • AUDIENCE: Hello.

  • My name [? Siobhana. ?]

  • DAVID MALAN: [? Siobhana. ?] And year or--

  • AUDIENCE: I'm a sophomore.

  • DAVID MALAN: Sophomore.

  • OK.

  • Nice to meet you.

  • AUDIENCE: Hi.

  • I'm a senior.

  • DAVID MALAN: It's nice to have you too.

  • Yeah.

  • AUDIENCE: Hi, I'm Athena a sophomore in FOHO.

  • DAVID MALAN: Athena.

  • AUDIENCE: Hi.

  • I'm Anurag.

  • I'm a first year at Matthews.

  • AUDIENCE: I'm Ethan.

  • I'm a first year at Weld.

  • DAVID MALAN: OK, and--

  • AUDIENCE: I'm Sarika.

  • I'm first year in Thayer.

  • DAVID MALAN: Wonderful.

  • All right, thank you all for volunteering.

  • Let's go ahead and do this.

  • You for the moment represent a heap of memory, if you will.

  • So if you could maybe all back up over here just

  • to where we have some available space.

  • We're going to need one of you to represent the list.

  • Siobhan was it?

  • AUDIENCE: [? Siobhana. ?]

  • DAVID MALAN: [? Siobhana, ?] come on up. [? Siobhana, ?] do

  • you want to go ahead and represent list.

  • And to represent our actual list, we have--

  • or Brian-- yeah, we have a name tag, hello, my name is list.

  • So you're going to represent the rectangle on the left that

  • represents the linked list itself.

  • And now initially we're going to go ahead initialize you to null.

  • So you can just go ahead and put that behind your back.

  • So you're not pointing at anything.

  • But you represent list.

  • And there's nothing in the list, no numbers in the list.

  • What was the next step?

  • If the goal at hand is to insert 2, 4, 5, 1, 3, we want to do what first,

  • what lower level operation to get 2 in there?

  • What was the first line of code?

  • AUDIENCE: malloc.

  • DAVID MALAN: malloc.

  • So we want to malloc a node for 2.

  • So let's go ahead and malloc.

  • OK, come on up.

  • So malloc.

  • And what's your name again?

  • AUDIENCE: Ethan.

  • DAVID MALAN: Ethan.

  • OK.

  • And what do we need to give Ethan?

  • Ethan has two values or two fields.

  • The first one is the number 2.

  • Thank you.

  • The next one is a pointer called next.

  • Now, you're not pointing at anyone else.

  • So you'll put it behind your back.

  • And now what do we want to do with? [? Siobhana, ?] what do we have to do?

  • AUDIENCE: Point to--

  • DAVID MALAN: Point to?

  • AUDIENCE: 2.

  • DAVID MALAN: Him, yes, number 2.

  • OK, so this now represents the picture where we have list here, 2 here,

  • but the null pointer as well.

  • All right next we wanted to add 4 to the list.

  • How do we go ahead and do this?

  • Well, with 4, we're going to go ahead and malloc.

  • malloc, all right.

  • And now, Brian has a lovely number 4 for you and a pointer.

  • What do we want to do with your pointer?

  • AUDIENCE: Not point it.

  • DAVID MALAN: Not point at anything.

  • Now, it's a little more work.

  • And I need a temporary variable.

  • So I'll play this role.

  • I'm going to go ahead and point at wherever Siobhana is pointing

  • at in sort of unnaturally this way.

  • That's OK.

  • We couldn't get hands that point the other way physically.

  • So we're going to point at the same thing here.

  • You're both pointing at 2.

  • And what am I looking for in order to decide where to put 4?

  • AUDIENCE: If it's greater than.

  • DAVID MALAN: If it's greater than some value.

  • So I'm going to check.

  • Well, 4 is greater than 2.

  • So I'm going to keep going.

  • And your name was Eric?

  • AUDIENCE: Ethan.

  • DAVID MALAN: Ethan, sorry.

  • So, Ethan, what are you pointing at?

  • Nothing.

  • So that's an opportunity.

  • There's nothing to his right.

  • So let me go ahead and have Ethan point at-- what's you're name again?

  • AUDIENCE: Athena.

  • DAVID MALAN: Athena.

  • Also, unnaturally, but that's fine.

  • And so now does Athena need to update her pointer?

  • No, she's good.

  • She represents the end of the list.

  • So her pointer can stay behind her back.

  • All right, let's go ahead and malloc 5.

  • You want to be our 5?

  • So now we need a 5.

  • So we need to hand you the number 5.

  • And what's your name again?

  • AUDIENCE: Sarika.

  • DAVID MALAN: Sarika.

  • All right, so Sarika's holding the number 5.

  • She also is going to get a pointer called next.

  • What should Sarika be pointing at?

  • AUDIENCE: Nothing.

  • DAVID MALAN: Nothing.

  • And now how to do I insert her into the right place?

  • Well, I have to do the same thing.

  • So I'm going to get involved again and be a temporary variable.

  • I'm going to point at the same thing [? Siobhana ?] is pointing out,

  • which is Ethan.

  • I'm going to follow this and see, ooh, wait a minute,

  • he's actually pointing at someone else.

  • So I'm going to follow that.

  • It's still number 4.

  • So I want to keep going.

  • Oh, wait a minute.

  • Athena is not pointing at anyone.

  • This is an opportunity now to have Athena point at 5 and voila.

  • But are you going to change your pointer yet?

  • No.

  • Now things get a little more interesting.

  • Could we go ahead and malloc 1?

  • And what's your name again?

  • AUDIENCE: Emma.

  • DAVID MALAN: Emma.

  • OK, Emma, we have the number 1 for you from Brian.

  • You have a pointer, which should be initialized as well to null.

  • And now we have a couple of steps involved here.

  • What do we want to do first?

  • What's your proposal?

  • AUDIENCE: Temporary pointer.

  • DAVID MALAN: Temporary pointer.

  • So I'm going to point at the same things [? Siobhana ?] is pointing at,

  • which is Ethan here.

  • But I see that 2 is greater than 1.

  • So what do I actually want to do?

  • Well, let me incorrectly for a moment-- [? Siobhana, ?] could

  • you point at number 1?

  • What have we just done wrong?

  • We've orphaned everyone else.

  • And even more visibly now, no one is pointing

  • at Ethan or beyond, which means we've just leaked that memory, never

  • to be recovered or free.

  • So we don't want to do that.

  • Undo, Control-Z.

  • What do we want to do instead?

  • What's your name again?

  • AUDIENCE: Emma.

  • DAVID MALAN: Emma.

  • What do you want to point at?

  • AUDIENCE: I want to point at [? Siobhana. ?]

  • DAVID MALAN: At that the same thing, [? Siobhana ?]

  • is pointing at, which is equivalent then to Ethan.

  • So go ahead and do that with your--

  • OK, sort of like Twister now.

  • That's OK.

  • And then [? Siobhana, ?] what I do want to point at?

  • Perfect.

  • So again a bunch of steps involved, but it really is just two or three steps

  • depending on which pointers we want to update.

  • And then lastly, let's go head to malloc 3.

  • And your name was again?

  • AUDIENCE: Anurag.

  • DAVID MALAN: Anurag.

  • So then we have a 3 for you from Brian.

  • We have a pointer for you.

  • It's initialized initially to null.

  • So you can put that behind your back.

  • I'm going to point at the same thing as [? Siobhana. ?] And here we go.

  • 1 is smaller.

  • 2 is smaller.

  • 4 is larger.

  • So let's get this right.

  • And who do we want to point at whom first?

  • AUDIENCE: 3 points at 4.

  • DAVID MALAN: 3 should point at 4.

  • So go ahead and do that.

  • And you can step a little forward just so it looks a little less awkward.

  • And then lastly, big finale, Ethan, who do you want to point at?

  • Number 3.

  • And thankfully, all these steps later, we have a linked list.

  • We have wonderful souvenirs for each of you.

  • We just need the numbers back.

  • Thank you to our volunteers here if we could.

  • OK, you can keep that.

  • You can put the numbers on the desk here.

  • So as these folks head off the stage, let me point out now one

  • of the shortcomings of the approach-- thank you all so much--

  • of what we've just done here.

  • Even though you all had the luxury of seeing 1 and 2 and 3 and 4 and 5

  • and you could even sort of immediately figure out where things go,

  • even these linked lists are implemented with chunks of memory.

  • And just like with arrays, or equivalently lockers,

  • the computer can only look at one piece of memory

  • at a time, which is to say that to a computer, that same linked list

  • kind of looks like this.

  • It's sort of blind to the specific numbers in that linked list

  • until it opens each of those doors of memory.

  • So to find where 3 goes or to find where 5 goes or to find where 1 goes,

  • all of those doors maximally might need to be opened one

  • at a time to find that value.

  • So with linked lists we have gained a feature.

  • We have gained the ability to add dynamically to the list, right.

  • I just kept malloc-ing, malloc-ing, malloc-ing additional students

  • and additional numbers.

  • So the list can grow as big as we wanted it to.

  • And that case, we had five.

  • We could have done it 50 times or 500 to add more and more numbers.

  • That's an upside, because we don't have to waste time inserting into an array

  • by resizing it and moving all of the original contents.

  • None of our volunteers had to move, technically speaking, just

  • to insert a number 5 or number 3.

  • They just had to point at someone else in memory or someone else on stage.

  • So if your data structure now is a linked list that looks like this,

  • we've paid a price for that dynamism.

  • We've paid a price for the ability to resize our list without moving

  • everything around that's already there.

  • What is a downside that you might perceive of a linked list?

  • What have we lost or given up here?

  • AUDIENCE: We lost random access.

  • DAVID MALAN: Sorry.

  • Say again.

  • AUDIENCE: We lost random access.

  • DAVID MALAN: We lost random access.

  • That's spot on.

  • We've lost random access.

  • Why?

  • Because the way you get from the beginning of the list to the end

  • is by following these pointers, following

  • these arrows, these breadcrumbs, you can't just jump to the middle elements,

  • even though obviously this one here on the screen to all of us humans,

  • that's the middle element.

  • You don't know that if you're the computer.

  • If the main variable that's storing this data structure

  • is the pointer, like [? Siobhana, ?] pointing

  • to the first elements of the list, you're

  • going to have to follow all of these arrows,

  • frankly counting them up and then retroactively realizing,

  • oh, there was 5.

  • I passed the middle one earlier.

  • You've glossed random access.

  • And what algorithm have we used wonderfully in the past

  • when we do have random access?

  • AUDIENCE: Binary search.

  • DAVID MALAN: Binary search.

  • So we've lost the ability now to do binary search as efficiently as we once

  • were able to.

  • And so if we consider now the running time of linked lists,

  • unfortunately, we've paid that price.

  • Searching now has gone back up to linear.

  • We no longer have logarithmic running time because of the fact

  • that we're stitching together this data structure.

  • And the only way to find the end of the list, the middle of the list

  • is to follow all of these arrows.

  • You can't just jump to one location.

  • Meanwhile inserting into the list, in the worst case, big O of n

  • is going to be linear as well, because you have to walk through the whole list

  • to actually find a spot for the given number,

  • especially if you're trying to keep it sorted.

  • So it would seem that even though we've gained

  • this feature of much more dynamic insertion

  • and we're building up something more interesting in memory,

  • and you can imagine this just taking much less time overall,

  • because you have to keep moving everything around

  • like we did with realloc, it's unfortunately

  • something we're paying a price for.

  • But that was a lot.

  • Let's go ahead and take our 5-minute break.

  • Fruit awaits outside.

  • And we'll be back.

  • All right, so during break, I whipped up one final example of our list program.

  • This one uses all of those building blocks.

  • And let's see if we can't follow along pictorially and code-wise what

  • it is we just built with all of these humans on stage.

  • So here is list list3.c.

  • It's available online.

  • So you can follow along at home afterward if you'd like.

  • And let's just walk through the lines that are written for us in advance.

  • One, I'm using standard I/O for printf.

  • And I'm using stdlib for malloc and free,

  • our new friends that give us dynamic memory.

  • Here is the definition of a node that again has a number inside of it

  • and a pointer, specifically a pointer to another node structure.

  • So that's what each of our humans represented, this time now in C.

  • What is my main program going to do?

  • Just for the sake of demonstration, the goal at hand

  • is just to write a program that initializes a linked list to nothing

  • initially, then adds a node with 1, then adds a node with 2,

  • then add a node with 3.

  • We'll keep it simple and not add 4 or 5 this time.

  • So how am I going to do this?

  • Well, on line 17, I propose that we create a variable called list

  • and have it be the address of a node.

  • So if I were to draw this now pictorially,

  • it's going to be just like our demonstration

  • a bit ago, where I have a rectangle here called list.

  • And initially, it's not pointing to anything.

  • So I'm just going to leave the box blank to represent NULL.

  • So that's that line 17 right here.

  • Now, let me go ahead and do the following.

  • Add a number to the list as follows.

  • Line 20 just gives me enough memory for a node.

  • And it stores that memory's address in a variable called n.

  • Lines 21 through 24 are just a safety check.

  • Did anything go wrong?

  • If so, just return 1 and stop the program.

  • We ran out of memory for some reason.

  • But these two lines now should look a little more familiar.

  • This now is going to go ahead and install 1 and NULL into that structure

  • as follows.

  • So let's recap.

  • This line here 20 is the same thing as allocating really

  • a node that looks like this in memory that has two halves.

  • One of those fields is called number, which I'll write there.

  • The other field is called next.

  • And then if we go back to the code, these two lines

  • are all about just installing values in that structure.

  • So if I go ahead to number and put the number 1,

  • I'm not going to bother drawing anything for next, because I'm

  • going to leave it implicitly as NULL.

  • So that's what's going on now.

  • What do I next want to do?

  • Well, the last line of code here under this comment that

  • says add number to list, I set list equal

  • to n where n again is pointing at this new node.

  • So that's the same thing as saying, well, list

  • is going to go ahead and point at that new node.

  • So after those lines of code, I've created a picture in memory

  • that effectively looks like this.

  • Now, let's go ahead and add the number 2 to the list.

  • It's almost the same.

  • So here's the chunk of code that's going to go and add a second node

  • to the list, this time containing 2.

  • Let's do it step by step.

  • Line 30, I'm going to reuse n as my temporary variable.

  • So I don't have to re-declare it.

  • It's the same n as before, but it's now going

  • to get a different address of memory thanks to malloc.

  • So that gives me another box like this that I'm

  • going to go ahead and draw like that with nothing in it initially.

  • I'm going to make sure per lines 31 to 34 that nothing went wrong.

  • But that's just as before.

  • And now in lines 35 and 36, I'm going to put 2 in there and NULL.

  • So let me go over there and let me go ahead and put 2 in there.

  • And I'm going to leave NULL blank implicitly.

  • That's the end of the list.

  • But now I, of course, conceptually have to link the node for 1

  • to the node for 2.

  • And here's where C syntax, even though it's new, kind of finally makes sense.

  • Notice here, I'm saying list arrow next equals NULL.

  • That maps perfectly to the picture.

  • List arrow x equals what? n.

  • Well, n is this thing over here.

  • So I just draw the arrow there.

  • And so the code actually finally lines up even though it's new for today.

  • So now I've drawn the picture as follows with 1 and 2.

  • Let's go ahead and add a third and final node.

  • This one containing the number 3, using these lines here.

  • So line 40 gives me a new node with malloc.

  • So that's going to give me a new node.

  • I'll draw it as a rectangle over here.

  • I'm drawing it left to right, but these things

  • could be all over the place in memory.

  • It doesn't matter where they end up.

  • I'm going to go ahead and check as before that's

  • it's not NULL, just to be safe.

  • Then I'm going to go ahead and install the number 3

  • and NULL in there just as before.

  • So that means let's go ahead and draw 3.

  • I'm going to leave that blank because it's going to be NULL.

  • And then the last line, you wouldn't typically

  • hard code this or write this explicitly in a program.

  • This is a bit more verbose than you need to.

  • Let me propose that you would probably use some kind of loop instead

  • and walk through the data structure step by step as I proposed earlier.

  • But if we really want to do this just for demonstration's sake,

  • notice, start at list, follow an arrow and go to next.

  • Follow another arrow and go to next.

  • We can literally do that with our picture.

  • So here we go.

  • Let me start at list and follow an arrow and go to next.

  • Follow an arrow, go to next.

  • And now this is NULL.

  • So what I want to update is exactly this, as with line 47, which said

  • follow two arrows, look at two next fields

  • interchangeably and then set it equal to n.

  • All right, so what remains here?

  • Well, this program's whole purpose in life was just to print a list out.

  • Here's a way where you can actually use a for loop

  • to iterate over a linked list.

  • It's kind of funky because we don't have i and ints and i++ and so forth.

  • But a for loop doesn't need to use integers or i's.

  • Remember that before the first semicolon, you have initialization.

  • In between the semicolons, you have a condition.

  • And then you have an update that happens over here.

  • So you'll get more experience with this with Problem Set 5 ultimately.

  • But for today's purposes, high level, notice

  • that this gives me a temporary pointer, like my big red hand earlier.

  • That's a node star pointer.

  • And that's why I was able to point with the big fuzzy hand.

  • And I set that equal to list.

  • So whatever the list was pointing at so was my temporary fuzzy hand.

  • I'm going to follow the following loop and so long as temp does not equal

  • NULL.

  • So earlier when I was wearing the big fuzzy hand,

  • I kept pointing, pointing, pointing.

  • And I stopped once it equaled NULL.

  • So this is saying keep doing the following until it equals NULL.

  • What do I want to do?

  • I want to just print out the integer that's

  • inside of whatever I'm pointing at inside of it's number field.

  • So go to whatever I'm pointing at, follow the arrow,

  • and go to the number field.

  • That's how we get at the data inside.

  • Once I've printed that out, for loops say that you just update a variable.

  • So what is that variable temp equals temp arrow next.

  • So if my fuzzy hand is pointing at someone

  • and I need to update it to point at temp arrow next,

  • that means go to whatever I'm pointing at, follow the arrow.

  • There's the next field and point at whatever

  • the next field was pointing at.

  • So you just keep updating what you're pointing at.

  • That prints out the list.

  • And then-- and we'll defer this ultimately to Problem

  • Set 5-- we will need to free this memory.

  • And actually you have to be a little clever about how you free memory,

  • but I'm going to use a while loop there, which turns out

  • to be a little cleaner, a little easier, ultimately

  • to free all of this mess I made in my computer's memory.

  • I kind of need to do the equivalent of freeing things,

  • but I need to free what's behind me, not what's in front of me.

  • Once you free memory, you should not touch it, traverse it, and so forth.

  • But again more on that final note in P Set 5.

  • All right, any questions on a high level on the code?

  • It's fine if it looks quite new.

  • We make it available so that you have a starting point when it comes

  • to using this kind of code yourself.

  • Any questions?

  • All right, so someone came up during break

  • and noted that this actually seems to be a regression in that arrays gave us

  • the ability to resize, even though it was a little expensive because you

  • got to copy everything from one place to another.

  • But we had random access and therefore binary search and therefore

  • logarithmic running time for things like searching assorted lists.

  • We seem to have given that up.

  • Linked lists give us dynamism where we and shrink things without wasting time

  • by moving things around.

  • But we've lost random access.

  • But you know what?

  • Now, that we have this ability using pointers and data structures to kind

  • of stitch things together in memory, connect things with arrows,

  • you know what, we can build fancier things.

  • Most of you are probably familiar with the idea

  • of a family tree, which is this hierarchical two dimensional structure.

  • And indeed, that's our inspiration here.

  • What if we don't just keep making one dimensional data structures,

  • arrays that go left and right, linked lists that kind of go left to right?

  • What if we actually use a vertical notion too and lay things

  • out more interestingly.

  • What can we gain from this?

  • Well, let me propose that anytime we've seen an array,

  • we can actually re-implement an array, but get

  • the best of both worlds, the best of arrays, the best of linked lists

  • as follows.

  • Here is an array, back from Week 1 or even Week 0

  • when we were searching behind doors.

  • And here, Week 2, when we were searching behind doors,

  • let's go ahead and note that if we were to do binary search on this

  • looking for some value, as before, many times you look in the middle first.

  • And then you decide, do you go left or right?

  • And if you go left or right, you'd look in the middle element over here

  • or the middle element over here.

  • And then what do you do?

  • You go left or right, looking at the middle element over here

  • or over here or over here or over here.

  • You know what?

  • Let me just kind of explode this picture because all of this

  • is happening in one dimension.

  • We can actually think of this is happening really in two dimensions.

  • Let me draw my same array, 1, 2, 3, 4, 5, 6,

  • 7, but let me represent it on different levels that's

  • indicative of what's happening.

  • I start in the middle.

  • And I go left or I go right.

  • I then go ahead and look at this element.

  • And then I go left or I go right.

  • So it's the same thing, but it's a two-dimensional rendition

  • of what we've been doing for a few weeks whenever we've done binary search.

  • Well, you know what this kind of looks like?

  • It kind of looks like a linked list, albeit without the arrows.

  • But you know what, I don't think I want to stitch this together from 1 to 2

  • to 3 to 4 to 5 to 6 to 7, because that's just going to be a linked list.

  • But what if I use my new-found familiarity with pointers,

  • use a few more of them?

  • So I spend more space and stitch this data structure together

  • in two dimensions conceptually.

  • Every node represented here is a rectangle.

  • It doesn't have to have just one pointer.

  • There's nothing stopping me from creating a new struct, a new definition

  • of node that has two pointers.

  • Maybe it's called left.

  • Maybe it's called right.

  • Previously, we had just one we called it next.

  • But there's nothing stopping us from creating a fancier

  • structure that actually has two.

  • And so we might make it look not like this as before for a linked list,

  • but let's get rid of the next pointer.

  • Let's make a little more room.

  • And let's actually give myself two pointers, left and right.

  • And I claim that this structure now in C could

  • be used to implement the tree that I just described, the family-like tree,

  • more properly called a binary search tree, in the following way.

  • This is a binary search tree.

  • One, because every node in the tree has at most two children,

  • hence the bi in binary, meaning maximally two.

  • It has zero children, as like these down here.

  • Or it has maximally two children.

  • Hence, the bi in binary search tree.

  • It's a search tree in the sense that I have taken care with this data

  • to sort things properly.

  • Notice the following definition.

  • For any node in the tree, every element to the left is smaller than it.

  • And every element to the right is greater than it.

  • That's a recursive definition, because watch, look at this node.

  • Everything to the left of it is smaller.

  • Everything to the right of it is larger.

  • Let's look at 6.

  • Everything to the left of it is smaller.

  • Everything to the right of it is larger.

  • So it's recursive in the sense that no matter what node you look at,

  • no matter what rectangle you look at, what I just said correctly

  • is true of both the left child or subtree and the right child or subtree.

  • So this is to say if you have a list of numbers, for instance,

  • or a list of anything and you actually store

  • them using nodes that look like this, but conceptually what you're really

  • doing is stitching them together two dimensionally

  • like this, guess what feature we just gain back?

  • What have we just improved?

  • I heard some murmuring over here.

  • AUDIENCE: Binary search.

  • DAVID MALAN: We've gotten back binary search.

  • So we still have dynamism, like a linked list.

  • We're still using pointers.

  • And suppose we want to add the number 0 or the number 8,

  • you could imagine 0 going over here, 8 going over here.

  • So we could still just plug them in without having to move everything

  • around like we would for an array.

  • But because you're stitching things together with additional arrows

  • wherever they are in memory, so long as you keep track of this data structure,

  • called a tree, with one pointer to the so-called root--

  • the root being upside down in this world of computer science--

  • this is the root of this binary search tree,

  • guess what you do if you're looking for the number 7?

  • Well, you see 4.

  • You know it's greater than 4.

  • So what do you do?

  • You move to the right, thereby ignoring the other half of this tree,

  • just like the other half of the phone book in Week 0.

  • Once you get to 6, you consider, I'm looking for 7.

  • What do I know?

  • It's got to be to the right.

  • And so you go.

  • The height of this tree happens to be logarithmic,

  • for those familiar, log base 2 of n, which

  • is to say I have 8 elements or 7 elements in this tree.

  • But it only takes me 1, 2, 3 steps to find the value.

  • It does not take big O of n, or a linear number of steps.

  • And if you want your mind really to be blown here,

  • it turns out this is actually the best application for recursion,

  • which might have felt a little forced previously when

  • we built Mario's pyramid with recursion where

  • you did factorial or product or sum or something

  • like that in section recursively.

  • It turns out that now that we have data structures that exist conceptually

  • in two dimensions that are recursively defined--

  • and by recursively defined, I mean for any given node, left is smaller,

  • right is bigger, and you can make that statement about any node in the tree--

  • watch what we can do in terms of implementing binary search.

  • If I have here a function called search, whose purpose in life

  • is to return true or false if the number 50 is in the tree.

  • How do you search a tree?

  • Well, it takes as input the tree.

  • More specifically, it takes the address of the tree.

  • More specifically, it takes the address of the root of the tree.

  • That is when you want to search a tree, you literally just hand it

  • the address of the very first tip top node called the root.

  • And from there, you can get everywhere else.

  • Just like with the list, we just need the beginning of the list.

  • So how do I go about searching a tree?

  • Well, let's consider the easy case first.

  • Suppose the address you're handed is null,

  • what should you do if you're looking for 50,

  • but you're handed the empty address, zeros?

  • AUDIENCE: Return false.

  • DAVID MALAN: Probably return false, right.

  • If I hand you no tree and I say it's 50 in here, it's an easy answer.

  • No, there's no 50, because there's no tree.

  • So that's our base case, if you recall that nomenclature

  • from our discussion of recursion.

  • You hard code.

  • You type in manually one explicit case that just gets you out of the program.

  • Next case, if 50 is less than the tree, follow the arrow

  • to the number field, then what do know?

  • 50 is less than the node you're looking at.

  • What direction do you want to go conceptually?

  • AUDIENCE: To the left.

  • DAVID MALAN: You want to go to the left.

  • So this line here searches the tree's left child, so to speak,

  • in the family tree sense, the left subtree.

  • So if we go back to the picture a moment ago,

  • if I'm looking for 50 in that story-- or let's make it more real,

  • if I'm looking for 1 in the current story,

  • I see that 1 is less than the current node.

  • So I go ahead and just search the left subtree.

  • And notice, this is a tree.

  • But so is this if you look at it in isolation.

  • And so is this.

  • And therein lies the recursive opportunity.

  • So again here, if 50 is less than the tree's number,

  • then go ahead and search the left.

  • Else if 50 is greater than the tree's current number, search the right.

  • Else logically what must be the case if the tree exists and it's not less

  • than and it's not greater than the number you're looking at?

  • It must equal the number you're looking for, 50, in which case

  • we can return true.

  • But you recall perhaps from scratch, we don't really need that explicit case.

  • We can just call it else instead.

  • Any questions then on this use of code?

  • We won't actually run this code.

  • But this is how you can implement recursively

  • an algorithm that is reminiscent of Week 0

  • searching for Mike Smith in the phone book, this time now searching a data

  • structure that itself is recursive.

  • All right, so what do we gain back in terms of running time,

  • in terms of searching a binary search tree.

  • To be clear, what's an upper bound on the running time?

  • We're back to log n, which was the goal.

  • And what about inserting into a binary search tree?

  • This one we're going to defer to a higher level CS class, because it turns

  • out you don't want to just go ahead and put 0 over there, and 8 over there,

  • because if you keep doing that, putting smaller and smaller numbers or bigger

  • and bigger numbers, you could imagine your tree getting very lanky,

  • like very tall over here or maybe very tall over here

  • and therefore not nearly as balanced as the tree we drew.

  • And so it turns out there are algorithms that let

  • you keep a binary search tree balanced.

  • So even as you add elements to it, you kind of shift things around.

  • You don't remove them in memory.

  • You just update the pointer, so that the data structure

  • itself does not get terribly high.

  • But that too is log n, which means we had arrays, which gave us binary search

  • capabilities in logarithmic time.

  • We then introduced the linked list, which gave us dynamism, the ability

  • to grow and, if we want, shrink.

  • But we sacrifice binary search.

  • But if we spend a little more space and use not one pointer for every node,

  • but two, we can actually tip the scales again, spend more space

  • and save time by searching the data structure, this time using

  • something logarithmic.

  • All right, so what would the ideal, though, be?

  • Every time we talk about running time, it

  • feels like we want to be low on this list and not high.

  • n squared was slow.

  • Big O of 1 is constant time.

  • That's fast.

  • Wouldn't it be nice throughout this story

  • if we actually found our way to a data structure that gave us constant time?

  • Like, my god, if we could just insert something

  • into a data structure with one step and find something in a data structure

  • with one step, that's sort of the holy grail, so to speak,

  • because you don't have to worry about big O of n or big O of log n.

  • You just jump immediately to the value you want.

  • Well, it turns out, theoretically there's

  • something that allows you to achieve that called a hash table.

  • But how you implement that is not necessarily obvious.

  • And it takes some expertise.

  • And indeed, in Problem Set 5 among the goals at hand

  • is to implement exactly this notion of a hash table that lets

  • you spell check a document super fast.

  • A word processing program would be so slow

  • if every time you wanted to check a word for whether it's spelled correctly

  • or incorrectly, if you had to search linearly or even longer rhythmically

  • a big dictionary file, it might actually be really slow to spell check a file.

  • But using a hash table, we can probably do much better.

  • A hash table is a combination of an array and linked lists inside of it.

  • So I'm going to go ahead and just for convenience draw my array,

  • this time vertically instead of horizontally.

  • But it's the same thing.

  • And it's just an artist's rendition anyway.

  • And suppose the goal at hand is to keep track efficiently of like a name tags.

  • So maybe we're holding a big event.

  • We've made some name tags in advance, which we indeed have.

  • And we want people to be able to pick up these name tags super efficiently.

  • It would be really annoying and pretty dumb

  • if we just made a big stack and name tags,

  • even if it's alphabetical, A to Z, then had everyone in the room line up

  • and look through all of the darn name tags looking for their name.

  • That's not a very inefficient system.

  • Fortunately, we've come prepared with some buckets, all of which are labeled,

  • because wouldn't it be nice if you're looking for your name tag,

  • you don't look through the whole darn list of name tags or stack?

  • You actually just go to your bucket.

  • And you jump instantly to your name, where hopefully you're

  • the only person with a name that starts with some letter.

  • And then you can just reach in and get it.

  • Well, how do we implement this conceptually?

  • Well, it's very common with a hash table if the inputs

  • are things like words or names to look at the characters in those words

  • to decide where to put those names or those name tags, if you will.

  • So here's an array of size 26, from 0 to 25.

  • But, you know what, It's convenient to think of this array

  • as maybe being indexed from A through Z. So still 26 buckets,

  • but this array is really just of size 26, 0 through 25 ultimately.

  • And suppose the goal at hand now is to go ahead and store these name

  • tags in advance.

  • So this is what the staff and I would do in advance.

  • And, Brian, if you wouldn't mind helping out with this.

  • The goal at hand is quite simply to get the name tags ready for students

  • to pick up.

  • And so where do I want to go ahead and put the first one?

  • So Albus is the first one whose name tag we made.

  • I'm going to go ahead and jump immediately to bucket 0

  • and put Albus's name right there in one step.

  • Meanwhile I've got Zacharias, and so even

  • though it's taking me a bunch of steps to go over here, if this is an array,

  • I have random access, as a human, and so I can immediately,

  • instantly put Zacharias over there.

  • It's a little laborious for my feet, but a computer could just

  • jump to 0 or 25 or anything in between.

  • All right, so Hermione--

  • maybe you're noticing the pattern-- so Hermione is going to be H,

  • or which is 7, which is going to be over here.

  • Ginny is 6, which is over here.

  • Ron is 17, which is over here.

  • So think of each of my multiple steps taking actually one step.

  • Fred is going to go over here.

  • As an aside, the staff and I discussed this morning

  • how we probably should've put the buckets closer together.

  • But that's OK.

  • Severus is going to go over here.

  • Petunia is going to go over here.

  • Draco is way over here, but doesn't matter, constant time, bracket 3.

  • James is bracket 9.

  • Cedric is bracket 2.

  • Perhaps play this part in 2x speed.

  • Luna is bucket 11.

  • Neville bucket 13.

  • Kingsley bucket 10.

  • Kingsley, there we go.

  • Minerva bucket 12.

  • Vernon-- ironically, we don't actually need this many names

  • to make the point we're trying to make.

  • But Vernon-- we got a little carried away with the names we recognized.

  • And now, the list is pretty full.

  • All right, so that's a whole bunch of names.

  • I filled up most of the buckets with a name tag.

  • But-- why am I out of breath?

  • But what's really convenient now is that if Cedric or Albus or Draco or Fred

  • or Ginny come into the room, they can index instantly,

  • randomly, to their pocket, get their name tag, and go.

  • Nothing linear.

  • They don't have to flip through the whole stack of name tags

  • with which I actually began the story.

  • But there's a problem ahead.

  • We very deliberately ordered the name tags thus far in such a way

  • that we don't create a problem for ourselves.

  • But among the more famous characters we've not heard from yet is Harry.

  • So Harry's name tag is still here.

  • Where does this go?

  • Well, Harry is going to go in bucket 7.

  • But wait a minute, there's already someone there.

  • So what do I do?

  • If I were only using an array, Harry's kind of out of luck.

  • Like Hermione is already in that location in the array.

  • And we would have to decide, either Hermione goes there or Harry,

  • but we can't just put them both.

  • But if we implement this new data structure called a hash table using

  • an array that's conceptually vertical, but that horizontally is a linked list,

  • you know what, that's fine.

  • We're just going to go ahead and link Hermione's and Harry's together.

  • So, yes, it's going to take both of them or one of them at least two steps

  • to find their name tag.

  • But it's not going to take big O of n steps to find their name tag,

  • at least if there's only two in this bucket.

  • All right, Hagrid, dammit, so he came in the door too.

  • So now that linked list is getting a little longer.

  • We now have a chain, if you will, a linked list of size 3.

  • Sirius is going to go over here in bucket 18.

  • But Severus is already there too.

  • Awkward.

  • Remus is 17.

  • Remus is going to go and link together with Ron there.

  • George is going to go into bucket 6, which is over here.

  • Lily is also going to collide, so to speak with Luna.

  • And this is a collision in computer science.

  • Anytime you have a value that you're trying to put in one place

  • but there's something there, you need to resolve the collision somehow.

  • So I'm proposing that we actually just link these together.

  • Or as we're doing here, to bucketize values in computer science conceptually

  • means to throw the value into a bucket, or physically as we've done here.

  • Lucius finally is going to go in bucket 11 too.

  • And lastly, Lavender goes in that same bucket.

  • Phew.

  • So thank you to Brian for helping choreograph that.

  • So this structure that you're looking at is what is called a hash table.

  • It is an array that you index into using what's called a hash function.

  • A hash function is like any function that we've seen thus far,

  • any program we've seen thus far--

  • something that takes input and produces output.

  • So if we consider our original picture from Week 0

  • of what computer science in itself is when it comes to solving problems,

  • hash function for today's purpose it's just this function, this process,

  • this algorithm in between that decides, given a name

  • tag, what bucket to put that name tag in.

  • And quite obviously in the real world, what algorithm

  • was I using to bucketize a name tag upon reading the name?

  • AUDIENCE: First letters.

  • DAVID MALAN: Looking at the first letter.

  • Why?

  • It's simple.

  • It's pretty efficient.

  • It means I can store a relatively small array of size 26

  • and just immediately put the name tags there.

  • So in this case, we might have fed in Albus to that hash function.

  • And it might return 0, representing A, if we're 0 indexing the array.

  • Or for someone like Zacharias, we might get out 25

  • just because the first letter of his name is z.

  • But this is kind of simplistic, right.

  • And we've seen a problem.

  • What is the problem with just looking, of course, at the users first letter

  • of their name?

  • What problem arose?

  • Yeah.

  • AUDIENCE: There might be more than one name with the first letter.

  • DAVID MALAN: There might be more than one name with the first letter.

  • And you know in the extreme--

  • and computer scientists and software engineers

  • often think about the extreme.

  • What is the corner case?

  • What could go wrong?

  • What if by chance there's just a lot of characters

  • in this universe whose names start with h or l, and maybe all of their names

  • just happened to start with h or l?

  • It doesn't matter how fancy your hash table

  • is, it's pretty stupid if all of the name tags are stacked up in a bucket.

  • So in that sense, a hash table, even though this

  • feels like it's pretty efficient, in the worst case, big O of n,

  • when it comes to inserting and searching,

  • because you could just get unlucky and get a huge stack of names

  • that by nature of the class just all start with the same letter.

  • So how can we mitigate this?

  • How could we mitigate this?

  • Well, you know what, rather than naively only

  • looking at the first name, let's leverage some probabilities here.

  • Why don't we look not at just the first letter, but maybe

  • the first two letters?

  • I bet if we look at the first two letters

  • we're not going to get as many collisions

  • as many people belonging to the same bucket.

  • So Hermione, Harry, and Hagrid was a problem we identified earlier,

  • not to mention a few other names.

  • But that was because we were looking only at h for the hash function, only

  • at the first letter in their name.

  • What if instead we look at the first two,

  • so we have a bucket for HA, HB, HC, HD, HE, HF?

  • And so Hermione now goes in this bucket specifically.

  • So we're going to need more buckets.

  • And they're not pictured on the screen.

  • And they're also not pictured here on stage.

  • We need more than 26 buckets.

  • Frankly, if we're looking at two letters, we need 26 times

  • 26, like 676 buckets now.

  • So more space, but we're hopefully going to decrease

  • the probability of collisions.

  • Why?

  • Well, the next name I might put in here is Harry.

  • He's going to end up in a different bucket this time.

  • That's great, because it would seem that now I can get access

  • to his name tag in constant time.

  • Unfortunately, Hagrid is still in the story.

  • And so we're going to have a collision with HA.

  • So even looking at the first two letters is not ideal.

  • So even though we have 676 buckets in this story, 26 times 26, which

  • is a lot of buckets, we're still going to get collisions.

  • So what would maybe the next evolution of this idea be?

  • Well, don't look at the first letter, don't look at the first two letters.

  • Why don't we look at the first three letters.

  • Surely, that's going to drive down the probability.

  • Unfortunately, that's going to drive up the number of cells in the array

  • and buckets on stage to 10,000 plus buckets this time around.

  • So that's a lot of buckets.

  • But suppose we use not HA, but maybe HAA, HAB, HAC, HAD, HAE, HAF, HAG, dot

  • dot dot, HAQ, HAR, HAS, dot dot dot, HEQ, HER, HES.

  • So we have a lot of buckets and even more in between not pictured.

  • Now we can go ahead and hash on Harry's name, Hagrid's name, Hermione's name.

  • And this time, by design, they're going to end up in different buckets, which

  • seems to be an improvement.

  • And indeed, it is, because now if I go searching

  • for Hermione or Hagrid or Harry's name tag, or they do themselves,

  • they're going to be able to find it in constant time.

  • But that's assuming there's not a lot of other kids with the name starting

  • with H.

  • And so a hash table still technically is big O of n

  • because you could just get unlucky and have

  • a big pile up of similar inputs, all of which produce the same output,

  • even if you're using a fancier hash function like this.

  • And there's a trade off too.

  • My god, we're using like almost 20,000 buckets now just

  • to store these names to speed things up.

  • At some point, you know, it's probably cheaper

  • to just let Harry and Hermione and Hagrid form a line

  • and find their name tag more slowly.

  • So there's this trade-off of time and space.

  • But if you have what's called an ideal hash function

  • and you figure out some magical algorithm written

  • in code that ensures uniqueness that no name tag will end up

  • colliding with another, then you can achieve this holy grail of big O

  • of one time, constant time for a hash function.

  • So it's this sort of tension between how much space do you want to spend?

  • How much effort do you want to spend figuring out

  • what that ideal hash function is?

  • So in the real world, and we'll see this in Python,

  • most computer systems give you a best effort, such

  • that a hash table is not big O of n usually.

  • It's actually, on average much much, much faster,

  • even though there's a theoretical risk that it can be slow.

  • And more on that too in a higher level CS course

  • where you explore data structures and algorithms more formally.

  • So technically speaking, it feels like search

  • could get down to big O of 1, constant time,

  • if every name tag ends up in a unique bucket.

  • But you could still get unlucky if there's a lot of H names

  • or L names or the like.

  • So technically speaking, a hash table is big O of n.

  • But, frankly, three names in a bucket, like Hermione, Hagrid, and Harry,

  • is much better than n names in the same bucket.

  • So even in the real world if you get rid of this asymptotic hand waviness,

  • that's faster.

  • That's much faster than putting everything

  • in a linked list or an array itself.

  • All right, so from there, I bet we can try one other approach here.

  • There's another data structure we want to present to,

  • not in code, but in pictures.

  • This one's called a trie.

  • Short for retrieval, even though it's pronounced differently.

  • A trie is a data structure that actually is pretty amazing.

  • And it follows this pattern of spending one resource to save on another.

  • A trie is going to use a lot more memory,

  • but it is going to give us actual constant time

  • lookup for things like names or words being inserted into the structure.

  • So what does it look like?

  • It's a little strong, because we need to leave room for ourselves

  • on the board with lots of memory.

  • A trie is a tree, each of whose nodes is essentially an array.

  • So notice the pattern here.

  • Computer scientists over time have been kind of clever

  • taking this idea, this idea, mashing them together and creating some monster

  • data structure, but that gives you some savings of time or space.

  • So this array at the very top represents the roots

  • of this trie, which again is a tree whose nodes are arrays.

  • And notice that the array is of size 26, for the sake of discussion,

  • A through Z, or 0 through 25.

  • A trie does this.

  • If you want to store a name in a trie, what you do, in this case,

  • is look at every letter in the word in question.

  • So for Harry' it would be H-a-r-r-y.

  • We're not just looking at the first, the second, and third.

  • We're looking at all of them.

  • And what we do is this.

  • Suppose the first letter in the person's name or their name tag or the word

  • more generally is an H. You go ahead and go to that index.

  • And if there's no child node, there's no tree yet below it,

  • another branch, if you will, you allocate another node.

  • And another node just means another array.

  • And so we've drawn two arrays on the board.

  • This now has the letter A highlighted.

  • All of the letters are technically there, because it's of course 0

  • through 25.

  • But we're only highlighting the letters we care

  • about for the sake of this example.

  • Here is H-a-g.

  • So it looks like the first name tag I'm trying to install into this data

  • structure is Hagrid.

  • Notice now that g is inside of that array.

  • I want to go now to r for Hagrid.

  • That gives me another array.

  • Now i, now d. d is the end of his name.

  • So I'm going to just color in green, or I can use like a Boolean flag in C code

  • that just says someone's name ends here.

  • So notice, I've implicitly stored Hagrid name now in this data structure

  • by storing one node, that is one array, for every letter in his name.

  • But there's this slight efficiency here because there's

  • other people in this story besides Hagrid

  • whose names are prefixes or share common prefixes.

  • So, for instance, suppose I want to install Harry into this data structure.

  • He is H-a-r-r-y.

  • And so that gives me a couple of more nodes.

  • And if I go ahead now and install Hermione in this, notice now

  • I have even more nodes in the tree.

  • But some of them are shared.

  • If you start at the very top and look at the H,

  • notice that both Hagrid and Harry and Hermione

  • at least share at least one node in common.

  • Now, what's cool about this ultimately?

  • So what is the running time of searching for someone in this data structure

  • if there's n people already in it?

  • Right now n equals 3 because there's three people in it,

  • even though there's a lot of nodes.

  • But what's the running time for searching this data structure to see

  • has Harry picked up his name tag already?

  • Has Hermione picked up hers?

  • Has Hagrid picked up his?

  • Well, how many steps does it take to find Harry or Hermione or Hagrid

  • in this data structure?

  • For Harry, it's H-a-r-r-y.

  • So it's five steps maximally.

  • For Hagrid it's H-a-g-r-i-d.

  • It's six steps maximally.

  • And H-e-r-m-i-o-n-e, 8 steps total.

  • And it's probably the case that if we read through the books,

  • there is going to be some upper bound on the length of someone's name.

  • I don't know what it is.

  • It's probably 20 characters.

  • Maybe 30 if it's crazy long.

  • But there is some fixed value.

  • Anytime you have a fixed value, that's what you by definition in CS

  • and in math call a constant.

  • If it's 20, it's 30, it doesn't matter.

  • But it's fixed.

  • People's names aren't growing every year in length.

  • There's some hard upper bound.

  • And so technically, if it only takes you five steps or six steps or eight

  • steps to find Harry or Hagrid or Harry or Hermione, that

  • is technically constant time or, as we've said, Big O of 1.

  • So we can actually then achieve, truly for searching this data structure,

  • for inserting this data structure, truly what we call big O of k

  • where k is some constant.

  • But a constant is the same thing, asymptotically, per our discussion

  • in Week 3, of big O of 1.

  • These are effectively constant time, because to find Harry,

  • you look only at H-a-r-r-y.

  • It doesn't matter if there's 1 million other characters in that trie already.

  • It doesn't matter if there's Hermione and Hagrid and everyone else from

  • the seven books in the data structure, because the only nodes you're looking

  • at are the ones representing H-a-r-r-y.

  • And that's a powerful thing.

  • Every other algorithm we've discussed thus far, certainly

  • for searching and sorting, has somehow been slowed down

  • by how many other names or numbers are in the data structure.

  • That is not the case for this one here.

  • However, there is a price being paid.

  • What appears to be the price that we're paying

  • to gain that really low running time?

  • AUDIENCE: Memory.

  • DAVID MALAN: Memory.

  • I mean, my god, it barely fits on the slide.

  • And this is just three names.

  • You're spending 26 amount of the memory to store one character.

  • Now there's some optimizations.

  • Over time, if you insert a lot of names, some of these nodes will be shared.

  • But this is a very wide, very dense data structure,

  • so to speak, because it's using so much memory to give you

  • that super amazing running time of theoretically constant time.

  • So again this theme of trade-offs is going

  • to persist in the remaining weeks of the semester where to gain one resource,

  • we're going to have to spend another.

  • So that there is a trie.

  • So it turns out that now that we have arrays and linked

  • lists and trees and tries and hash tables and yet other data

  • structures out there, we can actually implement

  • what are called abstract data structures, using

  • any of those as building blocks.

  • What we've kind of done today verbally and pictorially is

  • invent more of those pink puzzle pieces in Scratch, those custom puzzle pieces.

  • Now we have as building blocks arrays and linked lists and trees

  • and hash tables that we can use to solve other problems.

  • And one of the problems out there in the real world is something called a queue.

  • A queue and certainly in certain cultures immediately comes to mind,

  • what's a queue in the real world or an example thereof?

  • AUDIENCE: A line.

  • DAVID MALAN: So a line, right, lining up at a store

  • or a restaurant or a takeout place.

  • So a queue actually has a technical meaning and computer science too.

  • It's a data structure that is FIFO, first in, first out.

  • A queue, by definition should have people hopefully pleasantly lining up

  • one person in front of the other.

  • And it maintains this FIFO property, first in, first out,

  • such that if I'm at the front of the line

  • I am going to be served my food first and then the person behind me

  • and then the person behind them.

  • It'd be really obnoxious if you walked up to Tasty Burger, placed your order,

  • and whoever showed up most recently got their food first.

  • That would be an opposite data structure.

  • That's LIFO.

  • Last in, first out.

  • Not fair in the real world.

  • So you might hope then that the software that companies like Tasty Burger

  • use when they type in your order to the system

  • actually send those orders to the team working

  • in back cooking the food in a queue fashion,

  • because it'd be pretty obnoxious too if people behind you

  • were getting their food first.

  • So hopefully in software, you're implementing that real world

  • notion of a queue as well.

  • Printing, if you still print on campus sometimes, papers and whatnot

  • on printers, they're often shared printers on campus.

  • And so they have what are called printer queues.

  • You might go to Command-P or Control-P print,

  • but then hopefully, in fairness, if there's

  • 10 people who are trying to print to the same Harvard printer,

  • they are printed in the order in which they were requested.

  • It would be pretty obnoxious, again, if the order were flipped.

  • Well, it turns out with queues in computer science,

  • there's two fundamental operations, even though we humans don't really

  • think in these terms, enqueue and dequeue.

  • To enqueue means to get in line.

  • To dequeue means to get out of line, hopefully

  • once you've been served your printouts or your food or whatnot.

  • Using today's principles, arrays, linked lists,

  • you could probably imagine conceptually using them as building blocks

  • to implement this notion of a queue.

  • The software that Tasty Burger or any fast food place uses

  • probably has implemented in code some lines

  • that are using an array that's maybe being dynamically resized or better

  • yet a linked list that's growing and shrinking as people

  • are placing orders and getting orders.

  • So there's this one-to-one mapping between some of today's ideas

  • and even the real world as well.

  • There's kind of the opposite data structure that I referred to a moment

  • ago.

  • And these are generally known as stacks.

  • Stacks in the real world might be in the dining hall right.

  • Like here is of trays.

  • And they have this fundamentally different properties,

  • such that if the staff go ahead and clean the trays

  • and put them right here, it would be pretty obnoxious if to get your you

  • had to go through a FIFO fashion and get the first they put down

  • and take that out first.

  • No one does that, realistically.

  • If you've got a big stack of trays in the dining hall,

  • you probably enforce a LIFO order, last in, first out.

  • So if this was the most recently installed or clean tray,

  • you're probably, as the human, just going to take the top one,

  • even though that's not really fair to the below.

  • But it doesn't matter in this particular case.

  • So a stack gives you the opposite property.

  • And where else might you see these?

  • Well, your Gmail inbox.

  • If you use Gmail, your inbox, most likely by default

  • is configured as a stack.

  • Where do your most recent emails end up?

  • AUDIENCE: At the top.

  • DAVID MALAN: At the top.

  • Now, this is wonderful because it's a feature in that you always

  • see your newest mail.

  • Kind of a downside, though, to your friends who've emailed you an hour ago

  • and whose emails are now down here are on page 2 of your email.

  • So there's trade-offs here too.

  • Stacks might have desirable properties, like just get your tray quickly,

  • see your most recent email.

  • But if you're like me, as soon as email falls on the page 2,

  • you might never get back to it if the stack of trays

  • never actually gets exhausted.

  • Frankly, there might be in some dining hall on campus some way

  • down here that has never been used in years, because they

  • keep refilling the stack, and we keep taking from the top.

  • So that would be a bad property for a lot of context,

  • but not necessarily all.

  • So it turns out there's another data structure too-- oh,

  • and the operations there are not called enqueue and dequeue.

  • By convention they're called push and pop,

  • where this means pushing an element onto the stack.

  • Even if it's very gentle, that's pushing.

  • Popping means removing the top element.

  • So it's just nomenclature meaning adding and removing elements.

  • But there's one other data structure we'll give mentioned to today.

  • And that's known as a dictionary.

  • And we'll see this again in a couple of weeks when we look at Python.

  • A dictionary is the abstraction that you can get on top of a hash table.

  • This hash table literally involved physical buckets

  • and in code would involve arrays and linked lists.

  • That's like low level plumbing.

  • A dictionary more generally in computer science

  • is a data structure that has keys and values, words and values, words

  • and page numbers, anything that maps one thing to another.

  • Physical dictionaries in the human world, like an English Dictionary,

  • has lots of words.

  • And if a word is correctly spelled in your document,

  • it will be in that dictionary.

  • And if you have a typo, a misspelling, in your document,

  • it will not be in that dictionary.

  • So wouldn't it be nice if you could actually implement

  • a dictionary using maybe a hash table, but a smart hash table

  • that has plenty of buckets, so that you can answer a question, is this a word,

  • is this a word, super fast without having a whole stack of name tags

  • or, in this case, English words all in the same bucket.

  • And, in fact, that's the challenge for Problem Set 5.

  • We're going to give you a big text file with 140,000 plus English words.

  • And the goal for you is to implement a hash table with your choice of number

  • of buckets, your choice of hash functions,

  • and implement this notion of an array with linked lists that

  • stores those 140,000 plus words.

  • Dictionaries, though, do exist in the real world.

  • And taken last night at like 9:00 PM before Sweetgreen closed

  • in Harvard Square was this photo.

  • If you've ever ordered a salad at Sweetgreen,

  • they have a pretty clever optimized system so as to pick up your salad.

  • If you order on their app in advance, they

  • go ahead and put your salad under D for David, for instance, or B for Brian

  • and so forth.

  • So that when you go into the store, you don't

  • have to look through big O of n other salads.

  • You can jump immediately to the B section, the D

  • section, or any other section and get your salad.

  • Now, in the extreme case, maybe Harry and Hermione

  • and Hagrid all order at the same time.

  • So there's just a big stack at the H's.

  • So it's technically still big O of n.

  • But if you assume a nice uniform distribution of names,

  • this probably does work out pretty well, especially if the salads aren't

  • there by design very long.

  • But let's use our final minutes together to take a look at one final visual, one

  • made by some of our other friends online who put together an animation that

  • tells the story of the differences between stacks and queues

  • personified as follows.

  • [VIDEO PLAYBACK]

  • NARRATOR: Once upon a time, there was a guy named Jack.

  • When it came to making friends, Jack did not have the knack.

  • So Jack went to talk to the most popular guy he knew.

  • He went up to Lou and asked, what do I do?

  • Lou saw that his friend was really distressed.

  • Well, Lou began, just look how you're dressed.

  • Don't you have any with clothes with a different look?

  • Yes, said Jack, I sure do.

  • Come to my house and I'll show them to you.

  • So they went off to Jack's.

  • And Jack showed Lou the box where he kept all his shirts

  • and his pants and his socks.

  • Lou said, I see you have all your clothes in a pile.

  • Why don't you wear some others once in a while?

  • Jack said, well, when I remove and socks,

  • I wash them and put them away in the box.

  • Then comes the next morning and up I hop.

  • I go to the box and get my off the top.

  • Lou quickly realized the problem with Jack.

  • He kept clothes, CDs, and books in a stack.

  • When he reached for something to read or to wear,

  • he chose the top book or underwear.

  • Then when he was done, he would put it right back.

  • Back it would go, on top of the stack.

  • I know the solution, said a triumphant Lou.

  • You need to learn to start using a queue.

  • Lou took Jack's and hung them in a closet.

  • And when he had emptied the box, he just tossed it.

  • Then he said, now Jack, at the end of day, put your clothes on the left

  • when you put them away.

  • Then tomorrow morning, when you see the sun shine,

  • get your clothes from the right, from the end of the line.

  • Don't you see, said Lou, it will be so nice.

  • You'll wear everything once before you wear something twice.

  • And with everything in queues in his closet and shelf,

  • Jack started to feel quite sure of himself, all thanks

  • to Lou and his wonderful queue.

  • [END PLAYBACK]

  • DAVID MALAN: All right that's it for CS50.

  • We'll see you next time.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it