Placeholder Image

Subtitles section Play video

  • DAVID MALAN: Hi, everyone.

  • This is CS50's own David Malan and--

  • COLTON OGDEN: Colton Ogden.

  • DAVID MALAN: So this is a new approach for us.

  • We, of course, have lectures once a week here in Cambridge.

  • And of course, we've done a number of live streams over the past few years.

  • But we thought we'd try something new with our Twitch channel here.

  • Colton is a big fan of Twitch.

  • He's quite the gamer himself, and a programmer of games.

  • And we thought we'd use this as a more interactive time to actually chat with

  • some folks in the Twitch chat room, and actually walk through some code in real

  • time-- live coding--

  • to get into more depth on some of the things

  • that we might otherwise talk about in class, and then, ultimately,

  • take people's questions and steer the conversation in any direction folks out

  • there would like to go.

  • COLTON OGDEN: Yeah.

  • It's a very flexible format.

  • Everybody who's in, feel free, definitely, to chime today and chat.

  • If you aren't following the CS50 TV account,

  • I believe there's a 10 minute waiting period.

  • So definitely follow us so you can talk to us.

  • But I see we do have one person, Srini Vasank says hello.

  • So hello, Srini Vasank.

  • Nice to see you here.

  • DAVID MALAN: Hello.

  • Nice to meet you as well.

  • COLTON OGDEN: So yeah.

  • I guess we can take a little transition here

  • to your laptop where we have your code there.

  • DAVID MALAN: Magic.

  • Yep, so we're actually sitting here in Cambridge, Massachusetts

  • on Harvard's campus with a magical green screen behind us.

  • And so we're superimposing ourselves, the two humans,

  • which is why you see this nice glow on both of us today.

  • COLTON OGDEN: I wish I had a shot of just the green screen.

  • All I have is this.

  • DAVID MALAN: That's green, but you see it as gray, perhaps.

  • And this allows us to superimpose ourselves over the code,

  • like you've seen in some other channels, so that we can actually

  • talk in greater proximity.

  • COLTON OGDEN: We do it for lecture for the code it itself.

  • DAVID MALAN: We do.

  • We do this in Sanders Theater.

  • But the green screen's way behind.

  • And we use some camera magic to actually capture it across the room as well.

  • COLTON OGDEN: Yeah.

  • DAVID MALAN: All right, anymore hellos to tend to?

  • COLTON OGDEN: Not just yet, no.

  • Just Srini Vasank.

  • We have 10 viewers currently active.

  • DAVID MALAN: All right, wonderful.

  • Well, so glad to have you here in class.

  • It's like a CS50 section here on campus.

  • What we thought we'd do today is--

  • our first trial of this is-- take a closer look at one of the topics

  • we look at in CS50.

  • Those of you have been following along for some time

  • and are pretty much through half of the course

  • or so might have implemented your own spell checker at some points.

  • And for those unfamiliar, roughly halfway during CS50's semester

  • here on campus and online, we challenged students

  • to build their own spell checker, which is a piece of software that looks over

  • all the words you've typed out in a file,

  • and tells you, often with read underlines, which words you've

  • misspelled, that aren't in a dictionary.

  • And that's where the CS comes in.

  • That dictionary is just a really big list of words,

  • hopefully alphabetically sorted.

  • And it's a non-trivial problem to actually look up

  • every one of the words in your file against every word in the dictionary.

  • If you recall from our discussion of asymptotic notation,

  • that's like an n squared problem, if n is the number of words in each,

  • just to simplify a bit.

  • And that can actually take quite a bit of CPU cycles and memory.

  • And so one of the challenges in CS50 is, do that efficiently.

  • Minimize your use of memory.

  • Minimize your use of CPU time, so to speak.

  • And just write the fastest spell checker possible.

  • Now, what language, of course, do we tend to write this in though?

  • COLTON OGDEN: Well we start it in C, generally, for the course.

  • Yeah.

  • DAVID MALAN: And that's great.

  • Because C is super low level.

  • It's super fast.

  • And really takes advantage of the hardware.

  • But what are some of the prices you'd say you pay by writing stuff in C?

  • COLTON OGDEN: Programmer time would be number one.

  • Obviously looking for memory leaks, and seg faults,

  • and all those sorts of things that trip up a lot of programmers.

  • DAVID MALAN: Yeah, those of you programmed in C, maybe C++,

  • have dealt with pointers, memory addresses, segmentation faults,

  • buffer overflow exploits, and the like.

  • All of that comes into play.

  • And so why don't we start there?

  • Why don't we actually take a look at what it is you have had to do,

  • or if you're following along with CS50 for the first time,

  • might have to do if you tackle this particular problem down the road.

  • And then let's actually do it in a much more friendly environment of Python.

  • But consider what prices we actually pay for doing that.

  • So we have up here behind us dictionary.h.

  • This is the so-called header file that we give to CS50 students.

  • And in this header file, we define four functions.

  • Do you want to summarize what functions they have to implement?

  • COLTON OGDEN: Actually, offhand, I don't remember.

  • I have to look at the source code.

  • DAVID MALAN: Oh, that's cool.

  • COLTON OGDEN: Sorry.

  • DAVID MALAN: We have it right here if you'd like.

  • OK.

  • Well, we clearly prepped here.

  • COLTON OGDEN: It looks like check.

  • DAVID MALAN: Yeah, check is one of them, isn't it?

  • COLTON OGDEN: Load size and unload.

  • DAVID MALAN: Yeah, now fortunately, the file's completely commented.

  • So we can follow along here.

  • So these four functions characterize, if you will, the API for spellchecker.

  • API is just application programming interface.

  • And while this might sometimes mean some web based service or third party you're

  • using, it can also refer to local software that you yourself are writing,

  • or that someone else has started writing and you now need to finish.

  • And indeed, the challenge for this problem is to flesh out that API

  • and write the actual implementation thereof.

  • All right, so in order of operation, not alphabetically,

  • load is the first function that students in this class have to write.

  • This is a function that takes as input in C--

  • it looks like "const char* dictionary".

  • And how should folks think about what that means?

  • COLTON OGDEN: Well, const meaning that it can't be mutated.

  • So whatever data that they pass into this function

  • should be not tampered with by the function.

  • A char* is just a sequence of chars or a pointer to a char,

  • which can be any number of chars after that.

  • But it's a string, effectively.

  • DAVID MALAN: Yep.

  • I agree.

  • COLTON OGDEN: Text data that's going to become their dictionary.

  • DAVID MALAN: And then the name of this parameter, of course,

  • is just dictionary.

  • And so the idea is that this is the name or the path to the file containing

  • all of those words that I mentioned earlier

  • that represent the actual dictionary.

  • And load's purpose in life, per the comment atop it,

  • is to load dictionary into memory.

  • And it's supposed to return a bool--

  • true if successful, else false.

  • For those unfamiliar, we are using standard bool.h,

  • which is a header file that you can include in C. That actually gives you

  • access to two definitions of true and false as 1 and 0,

  • respectively, effectively.

  • So after load is called, once implemented,

  • then it's the function called "check" that's supposed to get it called again,

  • and again, and again for every word in the file

  • you're actually spell checking.

  • So it's that one that's going to be especially performant.

  • Super fast, super minimal memory usage, hopefully,

  • if you're really trying to optimize those two things.

  • What does it appear that this takes as input to,

  • if you want to tackle that one?

  • COLTON OGDEN: The check function.

  • So it looks like the same thing.

  • So char*, meaning a pointer to a char.

  • So a variable length string, or a sequence of chars.

  • And then words.

  • So it's basically going to be any size word.

  • And we're going to, basically, check for the string header representing

  • the signature, whether the word is actually in the dictionary.

  • So you check our dictionary for this word

  • and see whether it's actually inside that dictionary, that data structure.

  • DAVID MALAN: Yeah.

  • And so strictly speaking, you don't have to use const

  • in either of these definitions.

  • This is just a nice mechanism for really protecting yourself from yourself,

  • lest you accidentally try changing word or try changing dictionary.

  • The fact that you defensively-- or whoever wrote this file-- defensively

  • put const there means, you just won't accidentally

  • screw up, change the value, and introduce

  • some bug that could be very easily avoidable by making it read only,

  • as const effectively does.

  • So lastly, there's two other functions in this file.

  • Per the comments, those are size and unload.

  • Unload is meant to do the opposite of load.

  • So whatever work we end up doing in load is supposed to be undone,

  • freed up in unload.

  • And then size is supposed to just return the number of words in the dictionary.

  • Now, maybe that's a linear operation.

  • You just iterate over whatever your data structure

  • is, counting up every one of the words and return that value.

  • Or if you're smart about it, you could probably just

  • keep like an int or a long around, and just, as you're loading words,

  • keep track of how many words there are.

  • And then just spit out in constant time that same value.

  • So in C, this is a pain in the neck to actually implement this thing.

  • You can implement it as a big array, a linked list to give you some dynamism

  • and growth.

  • You can implement it as a hash table, which is often implemented

  • as an array with multiple linked lists.

  • Or you can implement it using a try, an even fancier data structure, that

  • just consumes lots of memory, but theoretically,

  • is constant time in its operation, at least asymptotically.

  • So we're not going to do that though in C.

  • Indeed, that's one of the challenges in CS50 itself.

  • And one of the great moments in a course like this

  • is when you actually change context, and you switch from C--

  • a very low level language-- to something like Python, which

  • is higher level, so to speak, that has lots

  • more abstractions, lots more features, lots more capabilities built

  • into the language itself.

  • You can whittle down a 10, 20, 30, 40 hour project into truly just minutes

  • if not seconds, once you're actually super learned in the language.

  • Shall we dive into some actual live code?

  • COLTON OGDEN: Sure.

  • Before we do, Srini Vasank has a question for you.

  • DAVID MALAN: Yeah.

  • So pass by value can be used here instead of const.

  • Short answer, no.

  • You are already technically passing in the dictionary by its value.

  • But the catch is that the value you're passing in is its address.

  • And in so far as passing in an address allows you to dereference that address

  • and go to that address in memory, means that it's potentially

  • vulnerable to being changed.

  • And so we're declaring const here for the very reason

  • that we're technically passing by reference here, or by value.

  • But that value is an address.

  • And so this is why we have this defense mechanism in place.

  • If we were instead passing in a specific char, or an int, or a float,

  • or any other primitive type that doesn't involve memory addresses,

  • then yes, pass by value avoids this.

  • Because the worst you can do is change a copy of the argument.

  • But in this case, by definition, we are passing in a string,

  • which is indeed a char*, or address of a character, or a sequence of character.

  • So we can't-- it's not as easily said as passing by value.

  • COLTON OGDEN: Correct.

  • DAVID MALAN: The values here are what are [INAUDIBLE]..

  • COLTON OGDEN: I guess the equivalent would be maybe duplicating

  • the data structure in the function.

  • But then that would be like horribly inefficient for something

  • massive, like a big dictionary file.

  • DAVID MALAN: Yeah, you can totally copy the whole dictionary.