Subtitles section Play video Print subtitles DAVID MALAN: Hi, everyone. This is CS50's own David Malan and-- COLTON OGDEN: Colton Ogden. DAVID MALAN: So this is a new approach for us. We, of course, have lectures once a week here in Cambridge. And of course, we've done a number of live streams over the past few years. But we thought we'd try something new with our Twitch channel here. Colton is a big fan of Twitch. He's quite the gamer himself, and a programmer of games. And we thought we'd use this as a more interactive time to actually chat with some folks in the Twitch chat room, and actually walk through some code in real time-- live coding-- to get into more depth on some of the things that we might otherwise talk about in class, and then, ultimately, take people's questions and steer the conversation in any direction folks out there would like to go. COLTON OGDEN: Yeah. It's a very flexible format. Everybody who's in, feel free, definitely, to chime today and chat. If you aren't following the CS50 TV account, I believe there's a 10 minute waiting period. So definitely follow us so you can talk to us. But I see we do have one person, Srini Vasank says hello. So hello, Srini Vasank. Nice to see you here. DAVID MALAN: Hello. Nice to meet you as well. COLTON OGDEN: So yeah. I guess we can take a little transition here to your laptop where we have your code there. DAVID MALAN: Magic. Yep, so we're actually sitting here in Cambridge, Massachusetts on Harvard's campus with a magical green screen behind us. And so we're superimposing ourselves, the two humans, which is why you see this nice glow on both of us today. COLTON OGDEN: I wish I had a shot of just the green screen. All I have is this. DAVID MALAN: That's green, but you see it as gray, perhaps. And this allows us to superimpose ourselves over the code, like you've seen in some other channels, so that we can actually talk in greater proximity. COLTON OGDEN: We do it for lecture for the code it itself. DAVID MALAN: We do. We do this in Sanders Theater. But the green screen's way behind. And we use some camera magic to actually capture it across the room as well. COLTON OGDEN: Yeah. DAVID MALAN: All right, anymore hellos to tend to? COLTON OGDEN: Not just yet, no. Just Srini Vasank. We have 10 viewers currently active. DAVID MALAN: All right, wonderful. Well, so glad to have you here in class. It's like a CS50 section here on campus. What we thought we'd do today is-- our first trial of this is-- take a closer look at one of the topics we look at in CS50. Those of you have been following along for some time and are pretty much through half of the course or so might have implemented your own spell checker at some points. And for those unfamiliar, roughly halfway during CS50's semester here on campus and online, we challenged students to build their own spell checker, which is a piece of software that looks over all the words you've typed out in a file, and tells you, often with read underlines, which words you've misspelled, that aren't in a dictionary. And that's where the CS comes in. That dictionary is just a really big list of words, hopefully alphabetically sorted. And it's a non-trivial problem to actually look up every one of the words in your file against every word in the dictionary. If you recall from our discussion of asymptotic notation, that's like an n squared problem, if n is the number of words in each, just to simplify a bit. And that can actually take quite a bit of CPU cycles and memory. And so one of the challenges in CS50 is, do that efficiently. Minimize your use of memory. Minimize your use of CPU time, so to speak. And just write the fastest spell checker possible. Now, what language, of course, do we tend to write this in though? COLTON OGDEN: Well we start it in C, generally, for the course. Yeah. DAVID MALAN: And that's great. Because C is super low level. It's super fast. And really takes advantage of the hardware. But what are some of the prices you'd say you pay by writing stuff in C? COLTON OGDEN: Programmer time would be number one. Obviously looking for memory leaks, and seg faults, and all those sorts of things that trip up a lot of programmers. DAVID MALAN: Yeah, those of you programmed in C, maybe C++, have dealt with pointers, memory addresses, segmentation faults, buffer overflow exploits, and the like. All of that comes into play. And so why don't we start there? Why don't we actually take a look at what it is you have had to do, or if you're following along with CS50 for the first time, might have to do if you tackle this particular problem down the road. And then let's actually do it in a much more friendly environment of Python. But consider what prices we actually pay for doing that. So we have up here behind us dictionary.h. This is the so-called header file that we give to CS50 students. And in this header file, we define four functions. Do you want to summarize what functions they have to implement? COLTON OGDEN: Actually, offhand, I don't remember. I have to look at the source code. DAVID MALAN: Oh, that's cool. COLTON OGDEN: Sorry. DAVID MALAN: We have it right here if you'd like. OK. Well, we clearly prepped here. COLTON OGDEN: It looks like check. DAVID MALAN: Yeah, check is one of them, isn't it? COLTON OGDEN: Load size and unload. DAVID MALAN: Yeah, now fortunately, the file's completely commented. So we can follow along here. So these four functions characterize, if you will, the API for spellchecker. API is just application programming interface. And while this might sometimes mean some web based service or third party you're using, it can also refer to local software that you yourself are writing, or that someone else has started writing and you now need to finish. And indeed, the challenge for this problem is to flesh out that API and write the actual implementation thereof. All right, so in order of operation, not alphabetically, load is the first function that students in this class have to write. This is a function that takes as input in C-- it looks like "const char* dictionary". And how should folks think about what that means? COLTON OGDEN: Well, const meaning that it can't be mutated. So whatever data that they pass into this function should be not tampered with by the function. A char* is just a sequence of chars or a pointer to a char, which can be any number of chars after that. But it's a string, effectively. DAVID MALAN: Yep. I agree. COLTON OGDEN: Text data that's going to become their dictionary. DAVID MALAN: And then the name of this parameter, of course, is just dictionary. And so the idea is that this is the name or the path to the file containing all of those words that I mentioned earlier that represent the actual dictionary. And load's purpose in life, per the comment atop it, is to load dictionary into memory. And it's supposed to return a bool-- true if successful, else false. For those unfamiliar, we are using standard bool.h, which is a header file that you can include in C. That actually gives you access to two definitions of true and false as 1 and 0, respectively, effectively. So after load is called, once implemented, then it's the function called "check" that's supposed to get it called again, and again, and again for every word in the file you're actually spell checking. So it's that one that's going to be especially performant. Super fast, super minimal memory usage, hopefully, if you're really trying to optimize those two things. What does it appear that this takes as input to, if you want to tackle that one? COLTON OGDEN: The check function. So it looks like the same thing. So char*, meaning a pointer to a char. So a variable length string, or a sequence of chars. And then words. So it's basically going to be any size word. And we're going to, basically, check for the string header representing the signature, whether the word is actually in the dictionary. So you check our dictionary for this word and see whether it's actually inside that dictionary, that data structure. DAVID MALAN: Yeah. And so strictly speaking, you don't have to use const in either of these definitions. This is just a nice mechanism for really protecting yourself from yourself, lest you accidentally try changing word or try changing dictionary. The fact that you defensively-- or whoever wrote this file-- defensively put const there means, you just won't accidentally screw up, change the value, and introduce some bug that could be very easily avoidable by making it read only, as const effectively does. So lastly, there's two other functions in this file. Per the comments, those are size and unload. Unload is meant to do the opposite of load. So whatever work we end up doing in load is supposed to be undone, freed up in unload. And then size is supposed to just return the number of words in the dictionary. Now, maybe that's a linear operation. You just iterate over whatever your data structure is, counting up every one of the words and return that value. Or if you're smart about it, you could probably just keep like an int or a long around, and just, as you're loading words, keep track of how many words there are. And then just spit out in constant time that same value. So in C, this is a pain in the neck to actually implement this thing. You can implement it as a big array, a linked list to give you some dynamism and growth. You can implement it as a hash table, which is often implemented as an array with multiple linked lists. Or you can implement it using a try, an even fancier data structure, that just consumes lots of memory, but theoretically, is constant time in its operation, at least asymptotically. So we're not going to do that though in C. Indeed, that's one of the challenges in CS50 itself. And one of the great moments in a course like this is when you actually change context, and you switch from C-- a very low level language-- to something like Python, which is higher level, so to speak, that has lots more abstractions, lots more features, lots more capabilities built into the language itself. You can whittle down a 10, 20, 30, 40 hour project into truly just minutes if not seconds, once you're actually super learned in the language. Shall we dive into some actual live code? COLTON OGDEN: Sure. Before we do, Srini Vasank has a question for you. DAVID MALAN: Yeah. So pass by value can be used here instead of const. Short answer, no. You are already technically passing in the dictionary by its value. But the catch is that the value you're passing in is its address. And in so far as passing in an address allows you to dereference that address and go to that address in memory, means that it's potentially vulnerable to being changed. And so we're declaring const here for the very reason that we're technically passing by reference here, or by value. But that value is an address. And so this is why we have this defense mechanism in place. If we were instead passing in a specific char, or an int, or a float, or any other primitive type that doesn't involve memory addresses, then yes, pass by value avoids this. Because the worst you can do is change a copy of the argument. But in this case, by definition, we are passing in a string, which is indeed a char*, or address of a character, or a sequence of character. So we can't-- it's not as easily said as passing by value. COLTON OGDEN: Correct. DAVID MALAN: The values here are what are [INAUDIBLE].. COLTON OGDEN: I guess the equivalent would be maybe duplicating the data structure in the function. But then that would be like horribly inefficient for something massive, like a big dictionary file. DAVID MALAN: Yeah, you can totally copy the whole dictionary.