Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • DAVID MALAN: All right, well, this is CS50.

  • And this is lecture 4.

  • And you'll recall that last time, we focused really on algorithms.

  • Not so much code, but on algorithms, the step-by-step instructions

  • for solving problems.

  • And we did this in the context of numbers.

  • But more specifically, we assumed that we had places to put these numbers.

  • We called it an array.

  • And that was back-to-back-to-back memory where we could put numbers or maybe

  • characters or something else.

  • But for the most part, we haven't really had more sophisticated ways

  • of laying information out in memory, and so we're kind of

  • stuck going left to right, right to left, or maybe a little more

  • intelligently using what we call binary search, or divide

  • and conquer, and kind of split the difference.

  • But we can start to get fancier.

  • Because if you recall, we have memory inside

  • of our computers, which we keep modeling as this rectangular region

  • with numbers and so forth.

  • But that's kind of a nice canvas, right?

  • Much like you could paint on a canvas, so could we kind of move things around

  • in memory.

  • But we don't yet have the vocabulary or really the data structures,

  • as they're going to be called, to do anything more

  • interesting than this linear approach.

  • And so today and next time when we begin to start

  • to solve problems more cleverly and then revisit some of the kinds of algorithms

  • that we've seen already, and those algorithms last time

  • were like these, linear search and binary search.

  • And that brought us full circle back to week zero when we used those,

  • albeit not by name.

  • And then recall that we introduced sorting.

  • Sorting's a more involved algorithm, more time-consuming.

  • But it was a precondition for which of those first two algorithms

  • to actually work?

  • Yeah, so binary search assumes, just like a phone book

  • assumes that the names are actually sorted from left to right.

  • Otherwise, you're completely wasting your time,

  • if you're splitting a phone book in half,

  • if there's no rhyme or reason to where the names and the numbers are.

  • So this is sort of a requisite ingredient

  • to get better efficiency with other algorithms.

  • And we're to see that moving forward.

  • And it's going to be up to decide, you know what?

  • It's going to take me more time to figure out

  • how to write the code for an algorithm than it

  • is to just write it the easy way, and just cut a corner

  • and run the code once.

  • But if you're a company, if you're a website,

  • if you're a piece of software that's doing the same algorithm again

  • and again and again, maybe you do want to spend that upfront cost.

  • Maybe it's human time, like your time, figuring out the algorithm.

  • Or maybe it's n squared or hopefully something better, n log n,

  • that you have to spend just to get the data in a nice format,

  • and then, thereafter, everything can be blazing fast.

  • So especially when we get to web programming,

  • we'll revisit these questions when we talk about databases as well.

  • But recall last time we also introduced a bit of formalism.

  • And we won't get more mathematical than this.

  • But These were sort of broad strokes with which

  • we can paint the efficiency of algorithms from sort of slowest on top

  • to fastest on the bottom.

  • And recall that we introduced some Greek symbols here just

  • to kind of standardize what it is we're talking about.

  • And as a quick check, big O refers to what kind of boundary?

  • The upper bound.

  • So maybe in the worst case, what's the upper bound on your algorithm's running

  • time?

  • How many steps, how many seconds, how many hours might it take?

  • By contrast, we had omega, which was the lower bound.

  • And then less commonly used, at least in CS50,

  • will be theta, which is just when those two are the same.

  • But realize those are the three ingredients,

  • especially if you continue on in other CS classes, that might be revisited.

  • And we introduce this, so that we're not just counting seconds.

  • We're not, like, looking at our watch and counting how fast our algorithm is,

  • because that's not a very reliable way of measuring whose algorithm is better

  • than someone else's.

  • Maybe my Mac is faster than your Mac or your PC.

  • And so we don't want to just look at raw time.

  • We want to think about things a little more theoretically,

  • albeit without worrying about denominators and lower order terms.

  • We talk generally in terms of these higher order terms

  • that we saw last time.

  • So I thought it would be fun to kind of reminisce.

  • Some years ago, when a certain someone was still

  • just a senator, being interviewed by Google's own Eric Schmidt,

  • who was the former CEO of Google.

  • And this was an interview that took an interesting algorithmic turn.

  • If we could dim the lights for a moment.

  • [VIDEO PLAYBACK]

  • [APPLAUSE]

  • - Now, Senator, you're here at Google.

  • And I like to think of the presidency as a job interview.

  • Now, it's hard to get a job--

  • - Right.

  • - --as president.

  • - Right.

  • - I mean, you're going through the rigors now.

  • It's also hard to get a job at Google.

  • - Right.

  • - We have questions.

  • And we ask our candidates questions.

  • And this one is from Larry Schwimmer.

  • - OK.

  • - What-- you guys think I'm kidding?

  • It's right here.

  • What is the most efficient way to sort a million 32-bit integers?

  • - Well, I--

  • - Maybe-- I'm sorry, maybe we should--

  • - --no, no, no, no, no, no, I think--

  • - That's not a fair question.

  • - --I think the bubble sort would be the wrong way to go.

  • - Come on.

  • Who told him this?

  • [END PLAYBACK]

  • DAVID MALAN: OK, so today we peel back the layers

  • that we've been assuming for some time now.

  • We've been talking about integers and characters.

  • But we also had this higher level concept

  • of a string, which was a generic way of saying

  • you had back-to-back characters that represent words

  • or sentences or paragraphs or whatnot.

  • But today we reveal that that's actually been a bit of a lie.

  • There actually is no such thing as string in the language called C.

  • And indeed, you might have kind of suspected

  • as much, given that we keep including cs50.h, which gives you things

  • like GetInt and GetString and so forth.

  • But it also gives you string, literally, a keyword

  • that actually doesn't come with C.

  • And before we peel back what exactly it is,

  • let's consider perhaps what problems it creates

  • and what powers it reveals to understand what's going on underneath the hood.

  • And let me propose that we try to implement, at least in pseudocode,

  • this algorithm here, swap.

  • So I proclaim that swap is a function whose purpose in life

  • is to take two inputs--

  • let's call it a and b--

  • and just swap them, so that a becomes b, and b becomes a.

  • And before we even get into the weeds of pseudocode or actual code,

  • we actually have two values here.

  • Might anyone like to join me onstage for just a moment for a drink of Gatorade?

  • A little Gatorade?

  • Around Maybe someone a little farther back today?

  • A little drink?

  • Yeah?

  • OK, come on down.

  • What's your name?

  • KATE: Kate.

  • DAVID MALAN: Kate, come on down.

  • Join me for some Gatorade onstage.

  • Welcome to Kate.

  • All right.

  • So the challenge at hand is this.

  • Let me just set us up here.

  • So we have some green.

  • That's very unnatural looking.

  • OK, we have some pink.

  • And you know what?

  • I think, actually, I'd like the pink, and maybe you could have green.

  • I need you to swap the values of these two cups if you could.

  • I need you to get the pink into the green cup and the green

  • into the pink cup.

  • KATE: I think I need another cup.

  • DAVID MALAN: OK, so she thinks she's going to need another cup.

  • And OK, so good.

  • I came prepared.

  • So we need another variable, if you will.

  • OK, so here we go, Kate.

  • Set us up.

  • All right, green goes into the empty cup.

  • All right, so pink goes into the former green cup.

  • And green goes-- most green goes back into the original cup.

  • Thank you so much to Kate.

  • Why don't we--

  • I don't know if you'd like that flavor.

  • Delicious.

  • OK, well, thank you very much.

  • Thanks for coming up.

  • Here we go.

  • Oh, and this.

  • Oh, and if you'd like.

  • KATE: Sure.

  • Thank you.

  • DAVID MALAN: So this was obviously very intuitive.

  • That one's not bad.

  • This is actually pretty intuitive, of course, for any of us in the room

  • that you obviously can very cleanly swap two values or two drinks.

  • You need some kind of temporary storage place.

  • And so we can introduce this in the form now of some actual code

  • by claiming that if you want to swap two values, you need the exact same idea.

  • So even if variables have seemed a little abstract

  • or you're at least used to them in the context of algebra x and y,

  • in a programming language, variables are just

  • like these empty cups into which you can put values.

  • In reality, it's green and pink.

  • But it could be a number, it could be a letter, or maybe something more.

  • Maybe even something like a string.

  • So we're going to come back to this idea.

  • Because if you agree that Kate's algorithm was correct,

  • once she had that temporary variable, it would

  • seem that this is a pretty reasonable translation to code.

  • Put store a in tmp.

  • So move a into the temporary variable, just like she did into the empty cup.

  • Then change a to be b.

  • And then change b to be what was a, which is currently in temp.

  • So it's kind of this three-step switcheroo,

  • just like Kate enacted for us here.

  • But recall last time that we have--

  • actually, let's do this.

  • Let me actually go ahead and open up a program here.

  • You know what?

  • Let me go ahead and open up, let's say, today's example

  • called noswap, which is a bit of a spoiler,

  • insofar as the name suggests what's actually going to or not

  • going to happen here.

  • And if I go ahead here and open this up in source 4.

  • Let me go ahead and open up noswap.

  • Let's take a look at what this program actually looks like.

  • Doesn't seem to do all that much.

  • But it does have a main routine up front.

  • And it's got an include of stdio.h.

  • And then what does main do?

  • It declares two variables, this time called x and y,

  • initializing them to 1 and 2 respectively.

  • And then a couple of printfs.

  • So printf x is such and such, y is such and such,

  • printing out its actual values.

  • So that's sort of week one use of printf.

  • And then we do the-- it looks like the exact same thing two lines later.

  • But in between those two lines is an actual function call to something

  • called swap, which, as it turns out, if we scroll down,

  • is literally the same thing as we had on the screen a moment ago.

  • I'm calling it a and b here, but I could have

  • called it anything I want, x and y, fu and bar, anything like that.

  • I have my temporary variable or temporary empty cup, like Kate had,

  • and I do the switcheroo.

  • So when I run this program after compiling it with--

  • let me go into source 4, and then do make no swap,

  • Enter It seems to compile OK.

  • So when I run noswap, what sentences should I see on the screen?

  • Based on main.

  • AUDIENCE: [INAUDIBLE] x is 2, y is 1.

  • DAVID MALAN: Yeah, x is 1 comma y is 2.

  • And then hopefully x is 2, y is 1, if swap, indeed, works exactly

  • as Kate enacted in the real world here.

  • So as the name might suggest, doesn't seem like something's

  • going to go quite right here.

  • That's weird.

  • It didn't actually seem to swap the value.

  • OK, so maybe it's a bug.

  • We've screwed up before.

  • Maybe I just made some error.

  • Maybe I'm kind of saying I'm doing one thing, but am actually doing another.

  • So let's double-check.

  • So int x gets 1.

  • Y gets 2.

  • That seems correct.

  • My sentence is correct.

  • It's x, y and then it's again x, y.

  • So I didn't accidentally reverse them.

  • And printing the same thing.

  • I'm calling swap.

  • And so it seems Kate was wrong.

  • She did not swap the green and the pink Gatorade correctly somehow,

  • because, at least in code, it's not working.

  • So why is this?

  • And it turns out that the answer to this, Kate's algorithm's

  • actually correct, but our interpretation of it in code isn't quite correct.

  • Even though it compiles, even though it runs, and it actually

  • is doing something underneath the hood, it's not actually, obviously,

  • doing precisely what we want.

  • So why is that?

  • Well, let's come back to this picture here, which is just a stick of RAM

  • that you might have on your desktop or your laptop.

  • And each of those black chips has some number of bytes.

  • Maybe a gigabyte.

  • Maybe less.

  • Maybe more, these days.

  • And I keep proposing that we think of these, if you can just kind of zoom in,

  • as just a grid of values.

  • So if we zoom in on that, and we think of the top left corner is the byte 0,

  • the one next to it byte 1, byte 2.

  • We can literally number all the bytes in our computer

  • from 0 to, like, a billion or 2 billion, however much RAM you actually have.

  • But it turns out that computers use that memory not exactly as left

  • to right, top to bottom.

  • There's a bit more structure to it.

  • And so let's actually be a little more precise today and zoom in

  • and propose this.

  • It's a lot of words all up front and we'll

  • tease them apart in just a moment.

  • But if you think of this as that black chip on your computer's memory stick,

  • it turns out that the computer's going to use different areas of memory--

  • the stuff down here, the stuff down here-- just a little bit differently

  • conceptually.

  • Humans, years ago, decided to use this memory

  • for certain things, this memory for certain things,

  • and we've all kind of standardized on that since, at least in C here.

  • So there's going to be two salient features here,

  • so called the stack and the heap.

  • And the heap we'll get to before long.

  • But a stack is just like it is in English.

  • If you go over to Annenberg or some dining hall on campus,

  • you just have a stack of plastic trays?

  • That's the same idea, we'll see, where you

  • can put more and more stuff on the stack,

  • and the heap's going to work a little bit differently.

  • But you can still think now, even though we've started to kind of label

  • different parts of memory with these words-- heap, stack, and others--

  • you can still think of the idea as being exactly the same.

  • Within the so-called heap portion of memory,

  • we're still going to number our bytes 0, 1, 2, 3, 4, much like I

  • proposed with some of squares here.

  • Same thing for the stack.

  • The numbers might be different, because they're obviously

  • farther away from those other boxes, but we're still just going to number them.

  • So the mindset is the same.

  • It's just we're putting different types of things in different places

  • starting today.

  • And let's see what this means for us in reality.

  • I'm going to go ahead here and create a new program called compare0.c.

  • If you'd like to play along at home, the same code

  • will be online on the course's website.

  • As usual I'm going to do, like, an include of cs50.h.

  • I'm going to do an include of stdio.h.

  • Then I'm going to do int main void.

  • I'm not going to worry about command line arguments today.

  • And then I'm going to go ahead and get myself two strings.

  • So I'm going to do string s, we'll call it, get_string.

  • And I'm just going to call this s.

  • So I'm going to prompt the user with a pretty simple string.

  • Like, just give me s.

  • I want one more.

  • String t gets get_string, t colon and then a double quote.

  • So just sort of, again, week one stuff.

  • Just give me one string, then give me another string, and put them in s

  • and t respectively.

  • And now, let me just do something that you might have been inclined to do

  • or had to do previously, which is just to compare these things.

  • So you know what?

  • I want to check.

  • If the user typed in the same word twice, let me just say,

  • if s equals equals t, I'm going to go ahead and print out, quote unquote,

  • "same," and a new line.

  • I also am going to go ahead and just literally print out different.

  • So I've whipped up a simple program that I

  • think is just going to ask the user for two strings,

  • and then just compare them.

  • Now, in the past, I have made mistakes when it comes to equal signs.

  • Have I used the correct number of equal signs for something like this?

  • Or should it just be the one?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: So it should be the two.

  • Because the one is used already as assignment.

  • Move this from right to left.

  • So equals equals seems to have the right semantics.

  • Like, I'm trying to compare s and t for equality.

  • So let me see, if I got no syntax errors here, let me go ahead

  • and make compare0.

  • OK, compiled.

  • And then ./compare0.

  • Enter.

  • Let me go ahead, and I'll type Stelios's name again.

  • I'm going to go ahead and type his name again.

  • Looks good.

  • And [INAUDIBLE] different.

  • OK, maybe, I mean, it's kind of a long name, maybe I just screwed up somehow.

  • So let's try this again.

  • So I'll type my own name.

  • Twice.

  • No.

  • OK, maybe we should try another, like Maria, Maria.

  • Three times incorrect.

  • There's got to be something wrong with my code.

  • So what is actually going on?

  • Well, maybe we should just go about this a little differently.

  • Let me go ahead and try another program.

  • Let me go ahead and create a new program called copy0.c.

  • And this time, maybe comparing--

  • I'm just going to copy the strings this time in order

  • to actually do something simple and see the results.

  • So let me go ahead and do an include of cs50.h and include of stdio.h,

  • int main void.

  • I'm not going to worry about command line arguments.

  • And I'm going to, again, do string s gets get_string.

  • And I'm just going to say, give me s.

  • And you know what?

  • This time rather than complicate things by getting a second string,

  • let's just say t is going to equal s.

  • So I think this is the correct use of equals,

  • because it's one equals, which means copy s from the right to t on the left.

  • And then I want to capitalize this string.

  • So let me just insert a little bit of logic here.

  • And I only want to capitalize the string if the user typed something in that's

  • long enough.

  • So I want to do a little bit of safety checks,

  • because we're starting to get in the habit of better error checking now.

  • So if the length of t is greater than zero, you know what I want to do?

  • I want to go ahead and change the first letter of t, the copy,

  • to be the result of calling toupper of that first character of t.

  • And then you know what?

  • Let's just print it out.

  • So s is going to be %s.

  • And we plug that value in.

  • And then let me go ahead and print out t %s backslash n comma t.

  • So it's a bunch of syntax all at once.

  • But to recap, I'm getting a string, calling it s, just like before.

  • I'm not getting a second string.

  • Now I'm just saying, you know what?

  • The string called t is going to be equal to s.

  • And we've used assignments, certainly, before.

  • Just to be clear, why did I complicate my code with this use of strlen here?

  • Why did I do that?

  • What could go wrong otherwise?

  • Someone else?

  • Propose to me what the user could do that might make problems for me?

  • Yeah.

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Exactly.

  • Suppose the user doesn't type his or her name, and just hits, like, Enter.

  • That's going to return a string of zero length.

  • Now, technically, a string of zero length

  • uses up how many bytes in memory?

  • AUDIENCE: One.

  • DAVID MALAN: Why one?

  • AUDIENCE: Backslash.

  • DAVID MALAN: Right.

  • There's still a backlash 0.

  • So recall last week when we talked about what strings are underneath the hood,

  • they're always terminated by backslash 0.

  • Whether there are zero characters, one characters, five characters, or 1,000,

  • they end with a backslash 0.

  • So even if the user just types Enter, he or she

  • is really creating a string that's of 1 byte in memory,

  • but its length, so far as we humans care,

  • is zero, because there's no characters in it.

  • But if you look at that zeroth character, just to be clear,

  • what is going to be at t bracket 0 in this case?

  • Backslash 0.

  • And if you now change this to be toupper, it's just weird.

  • You shouldn't be touching that backslash 0

  • and certainly not trying to capitalize it.

  • So I'm being a little defensive here.

  • Though, hopefully, frankly, toupper would not break in that case,

  • because it's not an alphabetical letter.

  • But I'm at least thinking about these circumstances.

  • Now I'm just going to go ahead and print out both s and t.

  • So let's see the results.

  • Let me go ahead and make copy0.

  • And oh, OK, we've seen this before.

  • Let me zoom in on the bottom.

  • So the first error message here has something

  • to do with implicitly declaring library functions.

  • That's a pattern you should start recognizing now.

  • What does that mean I probably omitted?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Yeah, so some library, some header file up top.

  • And maybe by instinct or maybe by running help50,

  • you would recall at this point, oh, right, I

  • need to do, like, include string.h.

  • And just to anticipate, is there one other I

  • should add before I embarrass myself with another error?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Yeah, so include ctype.

  • It's not so much an embarrassment.

  • It's just I should know it once I've seen it once before.

  • So here we have ctype.

  • Because what's in ctype?

  • Toupper, right?

  • And we would only know that from [INAUDIBLE] having done it before.

  • Otherwise, you pick it up along the way.

  • All right, so now let me go ahead and rerun make copy0.

  • Seems to work OK.

  • Let me go ahead and do ./copy0.

  • And let me type in Stelios's name, but all lowercase for s.

  • Hit Enter.

  • OK, kind of worked.

  • But what's the symptom here now?

  • Yeah?

  • AUDIENCE: [INAUDIBLE] capitalize the s string as well [INAUDIBLE]..

  • DAVID MALAN: Yeah, capitalize both s and t.

  • But maybe this is, like, a screen thing.

  • Like, a lowercase s and capital S kind of look the same anyway.

  • So maybe we're just kind of seeing things.

  • So let me just type my name in lowercase,

  • where the D is hopefully going to look quite different when--

  • no, same behavior.

  • So why is this?

  • What's going on?

  • Again, the code here is just an application

  • of the ideas we've been using for the past couple of weeks.

  • If you want to get a string, call get_string.

  • If you want to copy a variable, use the assignment operator from right to left.

  • And so here is where we're beginning to find weaknesses or sort of signs

  • of the lie we've been telling about what a string actually is.

  • Because it seems that you can't just compare two strings for equality.

  • And it seems that you can't just copy a string into another just

  • by using the same techniques we've used for chars and for ints and for all

  • of the primitives, so to speak, all of the lowercase types,

  • like double and float and char and int that come with C.

  • But string itself is not something that comes with C.

  • So what's actually going on underneath the hood?

  • Well, at the risk of or in an attempt to be dramatic,

  • today is when we sort of get to take off these training wheels that you might

  • have had on your bicycle for some amount of time

  • and now reveal that a string is actually a little something more arcane.

  • And you might have seen a glimpse of this, maybe

  • in textbooks or Google or online or whatnot.

  • But a string is actually just a synonym that CS50 staff

  • created for a little more complicated expression called char star.

  • Now, char is familiar.

  • Character.

  • And for now you can think of the char as maybe

  • implying that there's going to be multiple characters involved.

  • Because at the end of the day, that's what a string is.

  • A string, at the end of the day, is still a sequence of characters.

  • But more precise than that is the question

  • that we're going to explore now.

  • So let me go ahead and do this.

  • Let me go ahead now and take away the layer that is string,

  • and look at this instead just in the context

  • now of using char star instead of that same keyword.

  • I'm going to go ahead and create one more file here.

  • I'm going to call compare1.c.

  • So hopefully, an improvement on my previous version.

  • I'm going to go ahead and include cs50.h,

  • so I can still use GetString and any other functions.

  • I'm going to go ahead and include stdio, so I can actually print things.

  • And I'm going to go ahead and preemptively include string.h,

  • so I have some other fancy features available to me.

  • So let's do this now, int main void.

  • So no command line arguments still.

  • Instead of string s, I'm just going to take off that training wheel.

  • Char star s equals get_string, quote unquote, "s."

  • And then you know what?

  • Let's do char star t equals get_string t: like that.

  • So the only difference, thus far, is that I have literally sort of dropped

  • from my vocabulary the word string and I'm starting to write it as char star

  • instead.

  • It's not multiplication.

  • So there's a limited number of keys on the keyboard.

  • So humans, years ago, decided to use different symbols for multiple things

  • sometimes.

  • And so that's where we're at.

  • And now recall that last time, if s equals equals t, I wanted to print out,

  • quote unquote, "same" else I wanted to print out "different."

  • So I think this is the exact same program at this moment in the story,

  • except I've just change star string to char star.

  • So maybe that's it.

  • Let's see if maybe all I need to do is sort

  • of take away those training wheels.

  • Run make compare1.

  • Seems to compile.

  • And then let me zoom in at the bottom here and do ./compare1.

  • And let's go ahead here and compare, again,

  • Stelios's name against Stelios's name.

  • Still different.

  • Let's try my name.

  • David, David.

  • Still different.

  • Maybe it's a capitalization thing.

  • Let's do david, david.

  • All right, so it's still broken.

  • So obviously, just taking off the training wheels

  • doesn't make the problem better, apparently.

  • We need to actually understand how the bicycle works, if I can really

  • milk the metaphor today.

  • So how does a string really work underneath the hood?

  • Well, you know what?

  • I'm not quite sure how to explain it yet.

  • But I know that I can actually solve this problem

  • by not using equals equals.

  • I can instead do this.

  • If strcmp of s comma t equals equals 0, of all things,

  • now print that they're the same.

  • You would never know this unless you were told or found it

  • in a reference book or online resource or whatnot that strcmp exists.

  • As its name kind of suggests, it's just succinctly written,

  • strcmp, cmp, so you're comparing two strings s and t.

  • And if we read the documentation, like the man page or CS50 reference online

  • or Google or whatnot, we would see that the definition of strcmp is this.

  • If s and t are visually the same string, then return zero,

  • just because a human decided that.

  • If s comes before t alphabetically, return a negative number.

  • If s comes after t alphabetically, return a positive number.

  • So that's kind of nice, because there's three scenarios where

  • you're comparing two things.

  • Either they're the same or one is bigger or alphabetically before

  • or alphabetically after or smaller.

  • So there's kind of three cases to consider.

  • And we've seen that before when we've written out

  • pseudocode when testing for things like equality or looking for Mike Smith.

  • So strcmp leverages that same idea.

  • It's just a little messy that we have to use numbers, like 0

  • for equal and negative for less than and positive for greater than.

  • But let me try recompiling this now.

  • Make compare1.

  • And then let me do ./compare1, Stelios, Enter, Stelios, Enter.

  • And now they're the same.

  • Let's try this again.

  • Let's do Stelios again.

  • Maria.

  • They're different.

  • So again, proof by example should be sufficiently compelling.

  • But it's better, it seems.

  • And indeed, it is correct now.

  • So what is it that's going on?

  • And you know, maybe we just got lucky here too.

  • Let me go ahead and iterate on this and just improve

  • things in a couple of ways.

  • Let me go ahead and open up or point out this.

  • Let me go ahead and open up copy1, which is an improvement on our original copy

  • version as follows.

  • Let me do this here.

  • So what's actually going on?

  • So it turns out-- actually, let's do this.

  • Before we jump ahead to compare1, let's consider what's actually

  • happening in this computer program.

  • So to recap, we're getting a string, calling it s.

  • We're getting another string.

  • We're calling it t.

  • And then previously, we were just comparing equal equal.

  • But now strcmp seems to solve this problem.

  • So someone solved this problem for us.

  • Let's see if we can't infer what's going on.

  • Well, let me go ahead and just pull up a little chalkboard here of sorts

  • and propose to consider what exactly GetString is doing.

  • All this time, we say that GetString gets a string from the user

  • and just returns it to you.

  • And indeed, when we did an example the other day,

  • and our volunteer was wearing the GetString name tag,

  • he just handed me back a slip of paper that said the audience member's name.

  • So GetString does work like that.

  • But we only have access to this canvas of memory.

  • So what does it really mean to return a string

  • and to put it somewhere in memory?

  • Well, it turns out when you do a line like this, string s equals get_string,

  • there's two parts to this, the left and the right.

  • So what's actually going on?

  • Well, the left-hand side of this expression, string s,

  • is telling the computer, hey, computer, give me a chunk of memory,

  • and I'm going to draw it as a box, and call it s.

  • The right-hand side of this expression, obviously, does get someone's name.

  • And I'm going to draw that just for the moment as, like, Stelios's example.

  • So Stelios's name.

  • And just to be super precise, there's a backslash 0 in there, recall.

  • Let's continue that assumption.

  • But what exactly is going in here?

  • Like, Stelios's name, literally, visually,

  • cannot fit in that tiny little box.

  • So what is it, when Stelios's name is handed to me,

  • I'm actually storing in this little box?

  • Well, what is Stelios name here actually implemented as?

  • I've just written it out.

  • But what is it more technically, using some CS jargon?

  • It's an array of characters.

  • So let's see that.

  • So let's just kind of draw out what we know is underneath the hood,

  • even though we can just kind of take for granted that this works.

  • So this is not to scale.

  • But each of these characters, letters in his name, take up a byte.

  • They're an ASCII character, so to speak.

  • So I don't know where these things are in memory,

  • but I'm just kind of going to guess.

  • I've been using my computer for a while.

  • So maybe this is at, like, byte 100.

  • This is byte 101.

  • This is byte 102, and so forth.

  • So those bytes are numbered whatever the numbers actually are.

  • So given that, the fact that a string is just a sequence of characters,

  • and those characters each take up, like, a byte

  • of actual memory in your computer, and those bytes

  • can be thought of as having numbers from zero to, like, a billion or 2 billion,

  • what would you propose, just intuitively, we put in the box

  • when we get back a string?

  • AUDIENCE: 106.

  • DAVID MALAN: 106?

  • OK, why 106?

  • AUDIENCE: Because you can tell where the [INAUDIBLE]..

  • DAVID MALAN: If we say 106.

  • So that would be like--

  • oh, sorry, do you mean--

  • that's my way of writing 100.

  • Do you mean 106 as in one, two, three, four, five, six, this one?

  • AUDIENCE: Oh, no, no, no.

  • DAVID MALAN: Oh, just my messily written 100.

  • AUDIENCE: So we can, like, store the location of the various [INAUDIBLE]..

  • DAVID MALAN: Correct.

  • So if we fix my horrible handwriting, we could put in this box

  • just the address of that string.

  • That's not his whole name.

  • It's just, like, the location in my computer's memory

  • of the first character in his name, which feels a little reckless,

  • because Stelios's name is not s.

  • But why is this sufficient information to store Stelios's string

  • in this variable effectively?

  • AUDIENCE: Backslash.

  • Backslash 0 is there.

  • DAVID MALAN: Exactly.

  • The backslash 0 is there.

  • And even though a computer, if you go back to our locker metaphor,

  • kind of has to open each door in order to see what character is behind it,

  • we can do that with just a simple for loop or a while loop.

  • And a computer, given the address of Stelios's name's first character,

  • it's kind of like a map, like, X marks the spot.

  • Like, the computer can go to that location in memory

  • and say, oh, here is the first letter in his name.

  • And then the computer, if printing out his name

  • using printf or something like that, the computer

  • can just print out the s, then open the locker door next to it.

  • And if it's not a backslash 0, print out the t.

  • Open the next door.

  • If it's not a backslash 0, print out the e.

  • And repeat and repeat and repeat.

  • And as soon as it does get to that special sentinel value, as we say,

  • backslash 0, it closes the locker door and stops printing.

  • So because all this time we have been terminating our strings with backslash

  • 0 as the special demarcation, all we need

  • to know logically to store a string is where does that string begin,

  • the upside of which is now we can store just a single tiny chunk of memory.

  • It tends to be 4 bytes or 8 bytes, depending on what kind of computer

  • you have, 32 bits or 64 bits.

  • But what you don't need is as many bytes as there are characters in his name,

  • because those are already using other bytes in memory.

  • So when I now declare t in my first version of the program,

  • string t gets get_string, quote unquote, "t:"

  • just as my user's prompt, close quote, semicolon,

  • what's happening in this scenario?

  • Well, the logic is exactly the same.

  • This is saying, hey, computer give me a chunk of memory.

  • Call this chunk t.

  • And then even if the human types in literally the same thing, like,

  • S-T-E-L-I-O-S, that is underneath the hood just an array that we can keep

  • drawing like this.

  • And of course, the computer, for us, is going

  • to put that backslash 0, because the computer's been

  • doing that for the past few weeks.

  • It's not going to be the same location, though.

  • Maybe this is now at, like, location 300, and this is 301,

  • and this is 302, and so forth, because the computer's doing other things.

  • It's using memory over here or over here.

  • So it's not going to be in the same location.

  • But what, therefore, gets stored in t when we assign

  • the return value of get_string to t?

  • 300 in this case.

  • And that goes here.

  • And so if we go back to our original computer program, which, again,

  • was this program compare0.c, why now does it make perfect sense, eventually,

  • that s and t are always different no matter what you type in?

  • Because what's actually getting compared on this highlighted line

  • here when you do if s equals equals t?

  • The memory locations.

  • You're literally doing the same thing for the past few weeks.

  • You are literally just comparing two variables, s and t.

  • The difference is that now that you've broken the abstraction layer, taken

  • the training wheels off, however you want to think of what we're doing here,

  • you are literally doing the same thing, but you're just

  • comparing two addresses, two locations.

  • You are not very fancily comparing every character against every character

  • to determine what we humans think of as equal or identical strings.

  • So this notion of an address, this notion of a location,

  • is given the buzzword pointer.

  • And a pointer is just an address of something in memory.

  • And you can think of it, literally, as pointing to something.

  • Like, what is 100 pointing at?

  • You can visualize it as well.

  • It's kind of pointing to the first byte of some array of memory.

  • And t, meanwhile you can think of as pointing to, with an arrow,

  • the first byte of some other chunk of memory.

  • And as such, you know what?

  • At the end of the day, we really aren't going

  • to have to care where things are in memory.

  • These are uninteresting implementation details

  • and shouldn't really matter to us, because we

  • can talk about things symbolically.

  • We can say s instead of hard-coding the stupid number 100.

  • We can say t instead of 300, and let the computer

  • and the compiler do all that kind of math for us,

  • just knowing we have this canvas of memory at our disposal.

  • And so why does string not exist?

  • Well, this notion of a string--

  • apologies, again, for my handwriting-- is, again,

  • just a synonym for char star.

  • And this string, too, is just a synonym for char star.

  • And what does that actually mean?

  • Star is just the symbol that humans, years ago, decided

  • would represent an address, a pointer, a location.

  • Char is relevant.

  • Because what type of pointer is this?

  • It is a pointer to what data type?

  • AUDIENCE: A character.

  • DAVID MALAN: A character.

  • Or it's the address of a character.

  • So you can of it as pointer to, address of, however makes sense in your mind.

  • And so even though this is a string, well, t

  • and s are not pointing at a string, per se.

  • If you really zoom in, they're technically

  • only pointing at a character.

  • And it's our choice of implementation that, yes,

  • we can eventually find the end of a sequence by looking for backslash 0.

  • But this is what's really powerful and also confusing

  • about C sometimes is that at the end of the day, it's very simple operations.

  • And when you copy something, you're literally just

  • moving one number or one character from one place to another.

  • And if you want higher level constructs, like strings,

  • you have to implement them underneath the hood.

  • And this is not sort of a fun way to start programming

  • in the very first week of a class.

  • You'd be like, hey, everyone, today we're going

  • to talk about pointers and addresses.

  • It's mind-numbing, it's confusing, and it really

  • doesn't allow us to focus on the algorithms

  • and the actual problem-solving.

  • But now we're at the point in CS50, and soon with other problems

  • sets, where you want to-- or we'll have problems

  • that really can't or shouldn't be solved by just numbers

  • and characters and arrays alone.

  • We actually want to leverage the power of our intel inside

  • and do something more sophisticated.

  • And so we do, at this point, need to understand what's really going on.

  • And that's going to unlock a lot of capabilities.

  • So with that said, if compare0 was wrong, but compare1 was right,

  • what can we infer about how strcmp works?

  • Someone wrote this years ago, decades ago even,

  • but what did he or she do in order to implement strcmp correctly,

  • do you think?

  • AUDIENCE: They gave the same memory location.

  • DAVID MALAN: They gave the same memory location.

  • Can't be that, because they don't have control over the strings

  • I'm comparing, right?

  • I am passing in s, and I am passing in t,

  • and I got those from wherever I want.

  • So 20, 30, 40 years ago, someone only could take in as inputs two strings,

  • or technically two pointers, or really technically two addresses.

  • So what did this person do to implement GetString?

  • How about here.

  • AUDIENCE: [INAUDIBLE] compare the values that are actually [INAUDIBLE]..

  • DAVID MALAN: Yeah, just intuitively, if this program's correct,

  • strcmp must be doing what we humans thought equals equals would do

  • or might do.

  • And so what strcmp is presumably doing is it's taking two addresses--

  • we'll call them s and t--

  • and it's going to those addresses, and it's checking,

  • are these two letters literally the same?

  • And if so, it moves onto the next.

  • Are these three letters literally the same?

  • If so, it moves on.

  • And the moment it notices a mismatch, what does strcmp probably do?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Well, strcmp doesn't print out different.

  • That was me in 2017.

  • AUDIENCE: It returns a negative or a positive [INAUDIBLE]..

  • DAVID MALAN: Exactly.

  • It would return negative or positive based

  • on whether the string is determined to be alphabetically before or after.

  • And so only if we get to the very end and we see, ooh, we made it to two

  • backslash 0's at the same moment in time can strcmp just return 0

  • and say these two strings are, in fact, equal.

  • So we, too, could implement that, right?

  • It just sounds like a for loop, a while loop, just comparing those characters.

  • Yeah.

  • AUDIENCE: So when you were comparing [INAUDIBLE]??

  • DAVID MALAN: Oh, so say that once more.

  • AUDIENCE: When you were comparing Stelios [INAUDIBLE]??

  • DAVID MALAN: Stemios.

  • OK.

  • OK, so good question.

  • So let's actually see this.

  • Let me make one quick tweak here.

  • And let me go ahead and do this.

  • I'm going to go ahead and do int answer gets strcmp s comma t, just

  • so I can tuck it away in a variable.

  • And that means I can now just do if answer equals equals 0.

  • So now my-- sorry, I'm the only one who is playing along.

  • So what I just did was this.

  • Let me rewind in time.

  • So just a moment ago, my code looked like this.

  • I was just calling strcmp, passing in s and t, and checking the return value.

  • I'm just going to give myself a variable here,

  • and store the return value of strcmp now.

  • And then I'm just going to change this to be answer,

  • so that my code is identically, functionally, the same,

  • it's just now I have access to answer.

  • Because I want to answer your question empirically.

  • I want to just try this out.

  • I don't want to have to wrestle with the documentation

  • even Let's just try and see.

  • So let me go ahead here and do this, printout %i backslash n.

  • And then even if they're different, let's go ahead-- actually,

  • you know what?

  • I don't even need to put that there.

  • Let's just put it out here.

  • So printf answer is %i semicolon.

  • AUDIENCE: Can you put, like, comma [INAUDIBLE]??

  • DAVID MALAN: Yes, exactly there.

  • Precisely my point.

  • So now let me go and do make compare1 ./compare1.

  • And let me zoom in here.

  • And let's type in Stelios and his evil brother Stemois, Enter.

  • And so the answer is negative 1.

  • Why?

  • Well, alphabetically, Stelios, with the L, comes before Stemois, with an M,

  • and so we get back a negative number, which happens to be negative 1.

  • But I think the documentation just says it's a negative number.

  • So you don't check for negative 1, per se.

  • And now if we reverse them, let's just do a little quick check here.

  • If now we do the evil brother first and then Stelios,

  • Enter, now it's a positive value, because I've switched the order.

  • And of course, hopefully, all this time, when you type in Stelios both times,

  • the answer is, in fact, 0.

  • So a very simple idea.

  • We're just getting back an integer, but doing something ultimately pretty

  • powerful with it.

  • So what about copy?

  • So copy needs a better solution than we have at hand.

  • Because when I do copy, for instance-- let

  • me go ahead and just clear this and propose what's actually going on.

  • So with copy, I have string s gets get_string.

  • But let me drop the training wheels again.

  • So let me just write it as what it really

  • is, char star s equals get_string.

  • And then the prompt for the user was, quote unquote, "s."

  • And now that gives me a picture in memory, like this, called s.

  • And that's going to give me a string, which is going to look like this.

  • And let's go ahead and do Stelios's name again.

  • So Stelios with a really big backslash 0.

  • And my array as before.

  • And I don't care about the addresses anymore.

  • This is, like, an uninteresting story to keep making up numbers.

  • Let's just point at it.

  • Conceptually.

  • So a picture is worth as many words.

  • But now in my second program, remember, copy0--

  • let me just pull it up to remind.

  • In copy0, recall, we had this code, where I copied a string's address

  • by claiming just t gets s, t equals s.

  • So what's the implication now for this?

  • Well, let's just do char star t gets s.

  • And again, the only thing I've changed now

  • is I've changed the word string to char star,

  • just to be more pedantic about what's going on.

  • This gives me what kind of picture?

  • Well, it gives me a little chunk of memory called t.

  • But to be clear, what goes in s?

  • Just an address, right?

  • Char star makes super clear now that t is an address.

  • So all we can fit is an address here.

  • We can't fit, like, S-T-E-L-I-O-S and so forth.

  • We can only fit an address.

  • So what address?

  • Well, let's go back to the previous story.

  • If we did have numbers, like 100 and 101 and 102, as such,

  • technically in this box, it's just the number 100.

  • But again, who cares?

  • We can just draw pictures at this point.

  • What is in t?

  • The same thing.

  • So it's also 100.

  • But again, who cares about the numbers?

  • It's kind of conceptually the same thing.

  • So at this point in the story, when I have written string t gets s,

  • it is working correctly.

  • It is copying what's in s and putting it in t.

  • But what's in s is an address.

  • And so that's there for the same thing that's going to be in t.

  • Yeah.

  • AUDIENCE: How is the upper function able to work?

  • DAVID MALAN: Ah!

  • So how has the upper function able to work?

  • But work in what sense?

  • Because recall that when we used copy0 before--

  • let me zoom in here, ./copy0, and I typed in my own name all in lowercase,

  • it worked in the sense that it seems to have changed both s and t.

  • So it kind of worked, but it overworked, if you will,

  • because it should have only changed the copy.

  • So why is that?

  • Well again, let's go back to kind of the fundamentals here.

  • If copy0-- and let me go in to highlight these lines

  • to which I've added some comments now.

  • These are the lines of code to which you're referring.

  • T bracket 0 gets toupper of t bracket 0.

  • So I kind of read that now, even though it's

  • a little cryptic, as pass the first character of t,

  • t bracket 0, into the toupper function, and if it's a little d,

  • become big D, if it's a little a, become big A, and so forth,

  • and put that back in the same location.

  • But the catch is that, OK, so let's do that on the screen.

  • So t bracket 0 means go to the first element in the array,

  • and it happened to be a little d in the example I did,

  • and capitalize that letter.

  • Here it would be s, but it's already capital.

  • So now I'm kind of combining the stories.

  • And so when you change the first letter of t,

  • you're literally changing the first letter of s.

  • Because t and s are both kind of like maps, little treasure maps,

  • that lead to literally the same location.

  • And so here, too, is something that I've not yet stated clearly.

  • Even though t we've been calling a string,

  • and now we're all of a sudden calling it a char star,

  • you can still manipulate its characters using square bracket notation.

  • So when you see something like t bracket 0,

  • we've all known for the past week or two,

  • and especially with Caesar and Visionare,

  • that that just means change the zeroth character of the string called t.

  • But a little more technically now, if t is a pointer,

  • it really means go to the address and then get bracket 0.

  • So there's a little more work that's been happening.

  • It's just more of a mouthful than we needed a week or two ago.

  • So what is the solution then to this fundamental problem?

  • Copy0 is still broken.

  • You might understand how it works.

  • And if not, you can certainly read through this again

  • or take it a little slower or use the debugger to see what's going on.

  • But for now, it's broken.

  • So what's the fundamental conceptual solution here?

  • AUDIENCE: Set up two addresses.

  • DAVID MALAN: Yeah, set up two-- and not just two addresses.

  • A little more.

  • AUDIENCE: Two houses?

  • DAVID MALAN: Two what?

  • AUDIENCE: Two houses.

  • DAVID MALAN: OK, sure, two houses, right?

  • One for Stelios and one for evil twin or whatnot.

  • The key being, we need two of these chunks of memory

  • to be different strings.

  • We don't just want two addresses.

  • So we need to kind of fill in this gap here.

  • So how do I get an extra chunk of memory,

  • so that I can put a copy of s and a copy of t and a copy of E-L-I-O-S backslash

  • 0?

  • What mechanism allows me to get as many characters

  • as I need to fit the original name, so that I truly have two copies, so

  • that this picture is not this anymore?

  • How do I create a scenario in a correct version of my copy program,

  • so that t actually points to a true copy,

  • so that to your point, when I change the zeroth character,

  • I'm changing the zeroth character of the copy, not of the original?

  • Well, we don't have this answer just yet.

  • But we do if we introduce this one function here.

  • So I'm going to go ahead and open up here, proactively,

  • copy1.c, which I wrote in advance, which looks the same up here,

  • except for at least one.

  • You'll see eventually.

  • So I'm going to open up this program I wrote in advance called copy1.c.

  • And it's almost the same, but I've added a few more lines of code.

  • So here's a familiar line now.

  • Last week it was string s gets get_string.

  • Today it's char star s gets get_string, because we now kind of know

  • what's going on underneath the hood.

  • And just propose-- now I'm just being a little anal

  • here by writing more lines of code.

  • Line 12, 13, 14, 15.

  • Why do you think I might be having this if condition all of a sudden today?

  • This is technically better.

  • And I've kind of been cutting some corners the past couple of weeks.

  • AUDIENCE: Just to check if s is a string.

  • DAVID MALAN: Yeah, it's kind of-- it essentially is that.

  • Just to make sure that s is actually a string.

  • Because it turns out, my computer, as you know,

  • has a finite amount of memory, right?

  • I only have so many of those little black chips.

  • I only have so many bytes or gigabytes in my computer.

  • And even though we're really not taxing my computer's capabilities

  • with typing Stelios's name or mine, theoretically, my computer

  • can run out of memory.

  • And frankly, if your Mac or PC has ever frozen or crashed or hung or whatever,

  • maybe it ran out of memory.

  • It just didn't handle it well.

  • And so bad things happen.

  • So bad things can happen.

  • So how do we know if something bad has happened?

  • Well, it turns out that get_string, if you read the documentation for it,

  • and you might have or might be for the current problem set

  • where we ask you to consider what it's really doing,

  • it turns out that get_string can sometimes have problems.

  • Out of memory.

  • Dammit.

  • What do you do?

  • If the user typed in his or her whole life history,

  • and it couldn't fit in memory, what do you return?

  • You don't just return part of their life history, just a few of the words

  • or paragraphs.

  • You instead return a special value that we've not used until now.

  • And that special value is a keyword that can actually

  • be something called NULL, which is confusingly named,

  • because N-U-L is the name that humans gave to backslash 0.

  • N-U-L-L is the word people gave to this notion today.

  • So get_string can, in some cases, fail, because you run out of memory.

  • The user typed in too many characters at his or her keyboard.

  • So get_string can return NULL.

  • If you don't check for null, maybe your program

  • will crash or hang or do something unpredictable.

  • And so to hedge against that, we ask this.

  • And this is a very succinct way of doing this.

  • Let me actually be a little more verbose.

  • If s equals equals NULL, return 1.

  • So there's a couple of things going on here.

  • Line 12, if you believe me that get_string can sometimes

  • return a special value--

  • literally, N-U-L-L, all caps, no quotes--

  • it stands to reason you can check for that

  • and do something based on that decision.

  • And that decision can be to return 1 or 2 or negative 1.

  • So long story short, main, recall, recall that main,

  • we've typed in a couple of ways, sometimes with command line arguments,

  • sometimes without.

  • But the word int has always been there now for weeks.

  • And I kind of like just ignored it for the past several weeks.

  • It turns out that main is a little special.

  • And humans, years ago, decided that main will just always return an int.

  • And that integer is used by the computer,

  • like Linux or macOS or Windows, to just know if a program succeeded or failed.

  • If a program worked OK, succeeded, main is supposed to just return 0.

  • If something goes wrong, main is instead supposed to return 1 or 2 or negative 1

  • or negative 2, any number of values except 0.

  • 0 is the only number in the world that means OK,

  • which is a little paradoxical, because it usually means false or off.

  • But there's one 0, and there's, like, an infinite number of other numbers.

  • And a lot of things can go wrong in programs is the thinking there.

  • So in fact, if you've ever, on your Mac or PC,

  • seen an error message that's, like, error negative 239.

  • Like, what the hell is that?

  • It just means that some program returned a number

  • that some human decided would represent whatever problem just happened.

  • I, thankfully, don't have a very big program.

  • Not that much can go wrong.

  • So I'm just returning the first number I thought of, which was positive 1.

  • But it's kind of arbitrary.

  • But at least it's not 0.

  • So now I'm doing a little check there.

  • But you know what?

  • It turns out, null is just a synonym for 0, it turns out.

  • More on that before long.

  • So what does the exclamation point mean in C?

  • AUDIENCE: Not.

  • DAVID MALAN: It means not.

  • Like, reverse the answer.

  • So this is saying, if s if not, s.

  • So if s is null, according to my definition today, it's equal to 0.

  • So if not 0 means if true, then return 1.

  • And this is completely backwards and confusing I think.

  • But this is a way of saying, long story short,

  • if something went wrong, return 1 now.

  • That's all.

  • That's all.

  • So let's not dwell too much on that, because that is not the juicy part.

  • The juicy part is the scary thing.

  • So let's just consider for a moment what's going on.

  • This is probably the scariest line or longest line of code we've yet seen,

  • but it's doing something relatively simple.

  • What's going on here?

  • Well, on the left-hand side, let's pluck off the easy one.

  • Someone, in English, just tell me, what's

  • the left-hand side doing to the left of the equal sign?

  • Sorry.

  • Say again?

  • AUDIENCE: Declaring a string.

  • DAVID MALAN: Declaring a string.

  • Give me a string called s.

  • But let's be a little more precise.

  • Give me a variable called s that's going to store today a--

  • AUDIENCE: Pointer.

  • AUDIENCE: Array.

  • DAVID MALAN: --not an array, but a pointer or the address of a--

  • of a character.

  • So again, if we really want to be uptight today,

  • yes, it's declaring a string called s-- or sorry, t.

  • Sorry.

  • Yes, it's declaring a string called t.

  • But more precisely, it's declaring a variable called

  • t that's going to store the address of a character.

  • So it's more words.

  • This is why code is nice and succinct.

  • You can say in a few characters what just took me this whole sentence.

  • So that's all that's happening on the left-hand side.

  • So again, if we draw this as a picture, what's happening now

  • is that char star t just gives me this tiny little box.

  • That is not nearly enough room to store Stelios or David

  • or any number of other names.

  • So the magic must be in the right-hand side of this expression.

  • And this is a bit of a mouthful at the moment.

  • So let me distill it to its essence.

  • It turns out there is a function in the world called malloc

  • for memory allocation.

  • Succinctly named.

  • It takes one argument, the number of bytes you want.

  • And Windows or macOS or Linux, whatever your computer's running,

  • its purpose in life is to hand you back a chunk of memory

  • that is equal to that length, 5, in this arbitrary case.

  • So what does that mean pictorially?

  • If I call malloc 5, that means that the computer finds somewhere among all

  • of those green circuit boards and those black chips we've been talking about,

  • it finds me a chunk of 5 identical bytes that are back-to-back-to-back--

  • identically sized bytes that are back-to-back-to-back-to-back.

  • And then returns to me, malloc does, what, do you think?

  • AUDIENCE: The address.

  • DAVID MALAN: The address of?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Exactly.

  • Of the new place you've just allocated memory.

  • So if this is arbitrarily at location 400 now, and 401,

  • and so forth, what malloc returns is just a number.

  • And more precisely, an address.

  • It returns the address of the beginning of that chunk of memory.

  • But it's worth noting, malloc is completely generic.

  • It just gives you a chunk of memory.

  • It has nothing to do with strings or integers or anything like that.

  • So the burden is entirely on you to know how many bytes you asked for.

  • So I literally hard-coded 5.

  • So hopefully, I, the human, remember that when I write more lines of code.

  • But notice, absent from my picture is what?

  • Deliberately.

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: There's no backslash 0.

  • This isn't necessarily a string.

  • Maybe this is a really long number or something else.

  • A data structure, as we'll soon call them.

  • I just have to remember how long it actually is.

  • And in particular, it's worth noting now,

  • and we've used this terminology before, when

  • you don't initialize memory yourself, you should think of those values,

  • in almost all context, as just having garbage values.

  • Maybe it's the number 1.

  • Maybe it's a 0.

  • Maybe it's the number 42.

  • Anything.

  • It's just garbage from that memory maybe having been used previously

  • in your program for some other purpose before you didn't need it anymore.

  • So now if I go back to my code, 5 doesn't quite work,

  • because it doesn't even fit my name.

  • D-A-V-I-D is 5.

  • But how many bytes do I need to store my own name?

  • AUDIENCE: 6.

  • DAVID MALAN: 6, because of the backslash 0.

  • So if I want enough memory to store my name, I actually need this to be 6.

  • But this is obviously kind of stupid, because now I can never support--

  • I can't even support Stelios.

  • I can support me and Maria and some other short names, but not

  • longer names.

  • So that's why, at first glance, the code was scarier.

  • But let's just break it down.

  • This expression here in parentheses is asking probably a familiar question.

  • What is the length of s?

  • The string the human typed in.

  • And then why the plus 1, to be clear?

  • AUDIENCE: Backlash 0.

  • DAVID MALAN: The backslash 0, right?

  • Strlen gives you the human length of a string,

  • like, what we view on the screen.

  • That plus 1 gives us the backslash 0.

  • And then just to be super precise, I want that many chunks of memory

  • times the size of the chunk of memory that I want.

  • And technically in C, char is always 1 byte.

  • So technically, this is not necessary in this context.

  • But just to be super clear, I decided to be super pedantic

  • and just say, give me this many chunks of memory, each of which

  • should be this size.

  • And it turns out in C, there's an operator called sizeof, where you just

  • specify what type do you want to get the size of, and it will just tell you.

  • 1, 2, 4, 8, whatever it is.

  • So now, even though this looks pretty darn cryptic, all it's implementing

  • is this picture.

  • And it's storing in this variable t the number 400.

  • Or if, again, your sort of eyes glaze over at that level of detail,

  • just think of it as an arrow pointing at the chunk of memory.

  • So when I then do this just intuitively, why

  • am I asking this question, if not t?

  • And again, if not t is the same thing as saying if t equals

  • equals NULL, why am I doing this?

  • You've just met malloc.

  • But maybe what could go wrong?

  • AUDIENCE: You get a garbage value.

  • DAVID MALAN: Garbage values are OK.

  • I can deal with those, because I can just change them.

  • Say again?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: You don't have enough.

  • Maybe I said, give me malloc of, like, not 5, but give me

  • malloc of, like, 5 0, 0, 0, 0, 0, 0, 0, 0, 0, like, 5 billion billion.

  • Maybe I said, give me 5 billion bytes.

  • And the computer just doesn't have it, so it's not going

  • to return me part of the memory I want.

  • It's just going to say, mm-mm, like, NULL is the returned value.

  • It failed completely.

  • So that's why we check for it.

  • But that's all.

  • It's just another form of error checking.

  • And then now, notice the kind of work I have to do.

  • If I want to copy the string, it's clearly, today, not sufficient

  • to just do the equal sign, moving from right to left.

  • You have to do the work yourself.

  • So here's a for loop that we might have used a week

  • or two ago for just iterating over strings.

  • And that's all I'm doing.

  • I'm iterating over s, and copying each of s's characters

  • into t by just this line of code here.

  • And now, why-- this looks like a bug in almost every other context.

  • Why am I starting at 0, iterating up to the length of s,

  • but technically up through the length of s, even though I'm 0 indexed, right?

  • In almost every for loop we've written in the past,

  • we've done this with strings.

  • Why did I deliberately add the equal sign today?

  • AUDIENCE: You wanted to include the NUL character.

  • DAVID MALAN: Yeah, you wanted to include the NUL character.

  • N-U-L, just one L, the special backslash 0.

  • Otherwise you might end up putting, like, D-A-V-I-D, garbage value,

  • which might then be printed on the screen.

  • It might be considered part of my name.

  • And so there could be any number of garbage values, all of which

  • will be conflated for letters of my name.

  • And so now when we do this line of code, which

  • we did have before, we are literally only capitalizing

  • the first character of t.

  • It is completely independent of s.

  • Because the picture we've done is have not just this,

  • but an identical chunk of memory that has those same characters down there.

  • And we're only mutating or changing those characters.

  • So when we finally print the result down here, s and t,

  • we're printing the original s and the copy of s with its first letter

  • capitalized.

  • And that is why then when I did ./copy1 in my source directory.

  • So if I go into src4, and then I do ./copy1.

  • Dammit.

  • Make copy1.

  • And I do ./copy1.

  • And I type in david, all lowercase.

  • I capitalize only t.

  • And if I run it again, say, with stelios, all lowercase,

  • I capitalize only t, because now I'm actually manipulating the memory

  • as I originally intended.

  • Any questions there?

  • OK, that's a lot.

  • Let's take a five-minute break, and we'll come back with one more layer.

  • All right.

  • So we're back.

  • And to recap where we're at, we sort of revealed what a string actually is.

  • It's just a char star.

  • Star means it's an address or pointer, location,

  • however you want to think of it.

  • And the type in front of the star means, what is it the address of?

  • And we've been talking char star, so that

  • means the address of a character, which we can think of, higher level,

  • as just a string.

  • So if you're comfortable with that-- and even if you're not,

  • it'll sink in over time.

  • But if you're comfortable with at least the claim that a string is just

  • the address of a single character, then let's

  • consider what we've been doing all this time by way of this example.

  • String 0, which is on the course's website

  • if you'd like to look later, too.

  • And notice, I'm just following a familiar paradigm now.

  • I'm just getting a string and I'm storing it

  • in s, which is technically a lie.

  • I'm storing the address of the first character of that string

  • in s, to be super precise.

  • And then I'm just making sure, if there was not enough memory for this,

  • return 1 and just abort the program.

  • If there was plenty of memory, do this for loop.

  • And we've done this for the past couple of weeks in various contexts.

  • Initialize i to 0.

  • Initialize n to the length of that string.

  • And then go from i less than n just to print out each character at a time.

  • And just to be clear, because of the backslash n, if I type in like,

  • Stelios, it's going to print S-T-E-L and so forth, each character on one line.

  • So this is sort of week one, week two stuff at this point.

  • But today, just realize there's an equivalent to a fancier syntax,

  • but maybe just less intuitive.

  • If I open up string 1 now, notice I, again, do the same thing.

  • Just get a string.

  • Call it s.

  • Make sure there's enough memory.

  • And then move on.

  • And what do I do here?

  • This is kind of funky.

  • So it's still the same for loop from i to n.

  • But I threw away my square bracket notation,

  • which we've been comfy with, perhaps, for the past few weeks with strings.

  • And square brackets allow me to index into an array.

  • And an array is just a chunk of memory.

  • But you know what?

  • Those have really just been what a programmer would

  • call syntactic sugar, which is a really weird way of saying

  • they're like a feature of the language that just kind of simplify

  • aesthetically a fundamental idea.

  • And today's fundamental idea is that characters have addresses.

  • And therefore, string is just the address of a character

  • or the first character of a string.

  • And so what could this possibly mean?

  • This is an example of a fancier technique called pointer arithmetic,

  • which, no surprise, involves pointers and arithmetic, adding them together.

  • But what am I doing?

  • Well, just to be clear, in this example, what is s on line 19?

  • AUDIENCE: It's the address [INAUDIBLE].

  • DAVID MALAN: Good.

  • It's the address of the string that the user typed in.

  • So if it's Stelios, it's the address of capital S,

  • or if it's David, a capital D, or whatever the character is,

  • s is the address of the string or more precisely

  • the address of the first character in the string.

  • I is what in general?

  • It's just an integer.

  • It starts at 0, because of my for loop, and it goes on up

  • to the length of the string.

  • So it's 0, then 1, then 2, then 3.

  • And what do we know about strings?

  • Strings are sequences of characters back-to-back-to-back.

  • And every character is 1 byte.

  • So beautifully, if you think about what's going on underneath the hood,

  • if this is location 400, I don't even need to write the other numbers.

  • You intuitively know that this is location 401, 402, 403, 404.

  • It's just arithmetic to get from one to the other.

  • And so it stands to reason that if you know that, you can kind of throw away

  • the training wheels or the syntactic sugar of square brackets even,

  • and just say, you know what?

  • I want to print out the character that's at location s plus some offset,

  • so to speak, plus 0 more bytes or 1 byte away or two bytes away.

  • So I just do s plus i.

  • S is the address.

  • I is a number from 0 on up.

  • So it just kind of moves the address from left to right.

  • But there's one funky syntax.

  • And this is by far one of the most annoying design decisions in C,

  • especially when you're first learning this stuff.

  • There's another damn star here.

  • And the star means something different based on the context.

  • So notice, previously when we used this star--

  • well, previously, previously, star meant multiplication,

  • and life was much simpler.

  • Today star means it's an address of something.

  • But there's two different contexts in which

  • a star is going appear moving forward.

  • Here is where we're declaring a variable.

  • So any time you see a star that's clearly not

  • being used for multiplication and instead is

  • something fancier, like this, if there's a data

  • type to the left of the star, that means you are declaring a variable.

  • Specifically, a pointer variable that points to that type of data type,

  • like, on the address of a character.

  • If, though, in another context, you see the star used in a weird way-- like,

  • it's not multiplication, because there's nothing to the left of that star.

  • So it's not grade school math here.

  • It's just a star.

  • And it seems to be kind of related to the parenthetical expression that

  • follows, s plus i.

  • So s is an address.

  • I is a number.

  • S plus i is, therefore, just a different address.

  • So the star operator in this context just tells the computer,

  • go to that address.

  • Follow the treasure map and go to where X marks the spot, where X is just s

  • plus i, in this case, the address.

  • So when you see the star, when it's not multiplication

  • and there's no data type to the left, it means, don't give me a pointer.

  • Go to that pointer.

  • Go to that address.

  • And so this for loop now is equivalent to the program

  • we saw just a moment ago, even though it's kind of crazy-looking.

  • Which one is better?

  • I mean, honestly, when I wrote code, when I write code,

  • I still generally write it this way.

  • The syntactic sugar is nice.

  • It's just a little easier to read.

  • I don't have to do arithmetic in my head.

  • And frankly, it sort of just reads more intuitively than this version.

  • But both are correct.

  • And among today's goals are to reveal why this other fancier version is also

  • correct.

  • Let me introduce one other example that kind of

  • takes things in the opposite direction.

  • So all this time, we've had these training wheels

  • of GetInt and GetString.

  • And frankly, that's just because in C, it's really annoying

  • to get input from the user.

  • C does not make it easy, especially if you want to add error checking, right?

  • If you guys have used GetInt or GetString,

  • if you use GetInt and you don't type a number, you type a word,

  • it's going to reprompt you and reprompt you and reprompt you.

  • So there's a lot of, like, those features

  • that are just nice in the first few weeks of an intro CS class, when

  • it's hard enough to get the parentheses and the semicolons right, it'd

  • be nice if at least when you ask for an int, you get an int back.

  • And that's why we have those training wheels.

  • But if we take those off, and take away even today--

  • we're not going to yank it away, but if you sort of

  • decide to let go of the CS50 library, we have to come up

  • with other ways of getting user input.

  • So let's use today's ideas to do exactly that.

  • It's a super small program.

  • It's a little cryptic-looking.

  • But let's see how you can get an int from the user without using GetInt.

  • Here is one way.

  • Hey, computer, give me a variable called x in which to store an int.

  • So that's, like, week one stuff.

  • Just give me a variable.

  • No magic.

  • This is also week one.

  • Just print out x.

  • And then down here, print out a prompt for x.

  • And then down here, just print out whatever is in x.

  • So the only new line is this one here, scanf.

  • Scan means to kind of read from the keyboard.

  • Like scan whatever the user typed in.

  • F means formatted.

  • And that means you can read a certain type of input from the user.

  • So %i, recall, has generally meant integer.

  • So scanf borrows some of the syntax of printf,

  • but it's kind in the opposite direction.

  • Printf goes out to the screen.

  • Scanf comes in from the keyboard.

  • But dang it, if there isn't this one new piece of syntax.

  • But you can maybe reason through what's going on here.

  • So x is an integer, but we want to store something in it.

  • And up until now, any time we wanted to store something in an integer

  • or whatever variable, we just used the assignment operator from right to left.

  • And that's exactly how GetInt works.

  • But GetInt, underneath the hood, uses other techniques

  • that we haven't yet seen.

  • And what could be implemented as, though it's actually even fancier than this,

  • it could be implemented using scanf.

  • The way scanf works is this.

  • Scanf takes in a format string like this,

  • which just tells the function what type of data to expect.

  • And then you pass in not a variable, but the address of a variable.

  • So the ampersand means get the address of some variable.

  • Why is this useful?

  • Well, when you just do int x, like here, like we've done for weeks,

  • you just get a chunk of memory.

  • A ampersand x figures out in memory where that is.

  • And maybe it's at address 500, and so ampersand x literally returns 500.

  • So scanf then, as input, takes this format string, which just tells it

  • what kind of data to read, and then it takes the address of a variable,

  • because scanf's purpose in life is going to be to go to that address

  • and put at it whatever number the user typed in.

  • Now, why is this necessary?

  • Why in the world can I not just do scanf x?

  • Well, it's related to a problem we ran into earlier.

  • Recall that when Kate came up and did the swapping of those two variables,

  • we claimed that logically she was correct.

  • And we claimed or I claimed that my code was

  • similarly correct or at least a decent translation of what she did here

  • in reality.

  • But this did not work.

  • It still left one in x and two in y, and never actually changed them.

  • Well, why is that actually the case?

  • Well, let me go into noswap in the IDE.

  • Let's go ahead and do this.

  • Let me go ahead and use my old friend eprintf and say a is %i.

  • And then enter a there.

  • And then here I'm going to go ahead and say b--

  • actually, let me do it in one line just to kind of simplify the code.

  • So a is %i and b is %i.

  • And I'm going to pass in a and b here.

  • I just want to see inside of swap if Kate's logic is actually correct.

  • And maybe a and b are being swapped, but maybe somehow x and y are not.

  • So let's actually see what's going on here.

  • So if I go ahead now and run make noswap to compile it.

  • Oh, and I got to include CS50's library.

  • Include cs50.h.

  • Let me go ahead and compile this.

  • Make noswap ./noswap.

  • Let's see what actually happens.

  • So x is 1.

  • Y is 2.

  • And at the end of the story, x is 1, y is 2.

  • That was the problem.

  • Now I'm digging in a little deeper.

  • And I'm looking inside of noswap.

  • So at noswap line 20, I printed out a is 1, b is 2.

  • Oh, my god.

  • Kate was right.

  • Her code works, but it doesn't work lastingly.

  • It works within the function, but it seems

  • to have no effect on passing in x and y.

  • So it's kind of like she swapped it onstage.

  • But as soon as she stepped offstage, they

  • were kind of back the way they once were, which is just weird.

  • So what's actually happening?

  • Well, it turns out that when we talk about a computer's memory,

  • we have this layout here.

  • And there's different areas of memory.

  • And the two we're focusing on today are just these, stack and heap.

  • And all this time when we've used malloc to get more memory,

  • that memory's been coming from the so-called heap, which

  • you can think of as, like, the top of your memory right now.

  • Though technically, you can think of it as the bottom.

  • But it's from a specific area of your computer's memory.

  • The stack, though, is used differently.

  • The stack is a region of memory that any time you call a function,

  • the memory for your local variables comes from the stack.

  • And you get a sliver of memory from the so-called stack,

  • just like you would get a tray off of a stack in a cafeteria or a dining hall.

  • So what does this mean in code?

  • Well, if we look at the code for noswap, and we see here

  • the code as follows, that main declares x and y

  • and takes no command line arguments, but swap takes in two arguments, a and b,

  • and has one other variable called temp.

  • Where is that memory actually going?

  • Well, you can think of your computer's memory again

  • as having these two regions, heap and stack.

  • And previously, malloc was taking memory from up here.

  • Now let's focus on the bottom of this region of memory,

  • which we can think of as this here stack.

  • So this is, like, where all the trays go in the cafeteria.

  • And it turns out when you run a program like noswap,

  • this region of memory at the very bottom is where any of main's local variables

  • go.

  • And to be clear, what local variables does main have?

  • What are they called?

  • Just x and y.

  • So that means somewhere in this sliver of memory,

  • there's space for something called x and there's space for something called y.

  • And I, the programmer, put the numbers 1 and 2 there.

  • So that's what my computer's memory looks like in this program.

  • But when swap is called, much like I called Kate up onto the stage,

  • she technically gets her own memory space

  • for any work she's actually doing.

  • And so swap gets its own slice or, technically,

  • frame of memory, tray of memory, in some sense.

  • And in there go any of hers or our swap functions, local variables, which

  • are called what again?

  • A and b.

  • So a and b.

  • And one more?

  • Temp, a third one.

  • So swap in its stack frame, more precisely,

  • has room for three local variables, a and b and temp.

  • So what, though, happens when main calls swap?

  • Any time you pass a value, an input to a function, that receiving function

  • gets a copy of the value.

  • So as soon as you call swap, x gets copied into a, y gets copied into b.

  • And then when Kate finally used the empty cup or swap uses

  • the temporary value, some other value gets copied into temp, like 1, in order

  • to keep that value of a around.

  • So when Kate then swapped the Gatorades, and when

  • swap does its thing equivalently in code, it does actually work.

  • 1 becomes 2 and 2 becomes 1.

  • But as soon as swap returns, what happens to that memory perhaps?

  • It just gets thrown away.

  • The stack, as the name suggests, is constantly growing,

  • but then it's constantly shrinking as soon as these functions return.

  • So as soon as swap is done doing its thing, and as soon as Kate

  • walks off the stage, it's as though she was never here

  • to mutate or change x and y, because all the work she did

  • is just kind of ignored thereafter.

  • And that's because functions don't have access to each other's memory spaces.

  • They are only getting copies of what's actually passed in.

  • So what then is the implication or what is the solution here?

  • Well, let me go ahead and open up swap.c, which, as the name suggests,

  • actually does do a swap, and show how I've changed this.

  • So in main, I still declare x and y as 1 and 2.

  • I changed, though, line 13 to use ampersand, which again means what?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Not go to that address.

  • The opposite.

  • AUDIENCE: Go find.

  • DAVID MALAN: Find the address or get the address of x, get the address of y,

  • and then pass those in to the function.

  • Meanwhile, and frankly, it's understandable

  • if this is kind of visually overwhelming syntactically,

  • because of all the damn stars, but what is swap being declared now?

  • It's not taking an int.

  • It's not taking a second int.

  • It's taking the address of an int and calling it a.

  • And it's taking the address of another int and calling it b.

  • Different idea.

  • Then most confusingly of all, but it'll take time with practice for this

  • to sink in, this is implementing now what we want.

  • This is just storing in a local temporary value

  • whatever the contents of a are.

  • Because star, as you said a moment ago, albeit prematurely,

  • means go to the address in a.

  • And that's going to give you the number 1, because 1 is at that address.

  • So put 1 in temp.

  • This means go to the address in a again and put

  • at it whatever is at the address in b.

  • So take the 2 and put it at whatever a is pointing at.

  • And then the final switcheroo is go to the address in b and store

  • quite simply the number 1.

  • So what does that mean for our picture?

  • Because the syntax, frankly, does not make this all that much fun.

  • For our picture here, if we consider what's

  • actually happening underneath the hood now.

  • Let me just kind of tidy this up, and then

  • go back to a cleaner chunk of memory here.

  • Temp is going to be the same as before.

  • It's going to store just an int.

  • But a and b are fundamentally different.

  • And I don't quite know what to draw there, because I

  • need to be a little more precise.

  • So let's go back to our original example, where maybe this is byte 100,

  • maybe this is byte 101, and so forth.

  • So what goes in a, to be clear, in the correct version of swap?

  • It takes the address of an int, which would be, like, 101,

  • and then it takes the address of an int--

  • sorry, sorry-- 100, and then it takes the address

  • of the other int, which would be 101, which are not

  • the numbers we want to swap, but they're, like,

  • two treasure maps that lead to the original x and the original y

  • in someone else's frame on the stack.

  • So a function can only change another function's memory

  • if you give it a map to that or those locations.

  • And so now because of all the crazy uses of star in our code,

  • we're going to the address 100, which means go here, and grabbing its value.

  • We're going to the address in b and getting its value here,

  • moving them around so that these values don't change.

  • But when swap is called in this version, 1

  • becomes 2 and 2 becomes 1, because swap is a little more sophisticated now,

  • and it's like using these stars to follow a treasure map.

  • Go to this address, go to that address, and move those values around.

  • And as complicated, honestly, as this might feel or actually be,

  • realize that it boils down to just these two primitives--

  • get me the address of something, with ampersand,

  • and go to the address of something using star.

  • They kind of undo each other.

  • And just using those two new ingredients,

  • albeit with some strange-looking syntax, can we start to go really anywhere

  • we want in a computer's memory.

  • So if we go back now to scanf, does it perhaps

  • make more sense as to why scanf needs not x, but the address of x?

  • If we just passed an x to scanf, that would be like saying,

  • here is an integer.

  • Do whatever you want with it, because I'm not going to see the difference,

  • just like with the original no swap version.

  • But if I say, here is the address of an integer, swap--

  • sorry, scanf can go to that address and put any number it actually wants there.

  • And similarly-- let's do this.

  • scanf(1).

  • The world was so simple just a moment ago,

  • but I can break my world very quickly with this example.

  • Here is maybe an incorrect implementation of getString recall

  • that getString gets a string from the user

  • and returns the address thereto, as of today.

  • But here are four lines of code that are just a translation of the integer

  • example to strings.

  • But strings are different.

  • This is saying, in line 7, give me a pointer to a character.

  • Give me space for the address of a character.

  • This is just saying, hey, printfs colon.

  • scanf(%s)-- % feels right, because that's what printf would use--

  • and pass in s, and then print out whatever s is.

  • But there's a problem here.

  • s is indeed a pointer already, so I don't need the ampersand.

  • s is an address.

  • That's all scanf expects.

  • But if I say to scanf, here's the address

  • of a pointer which happens to be called s line 7, what is that really?

  • What is at that address?

  • Say again?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: It's not even that, because a pointer.

  • char(*s), is just the address of a character.

  • But what goes in variables by default in a program?

  • Garbage values.

  • So this is some random address is in the variable called s.

  • So it is the address, but god knows where that address is in memory.

  • So this is like giving scanf a map to some random location, some garbage

  • value.

  • And his purpose in life is to go there, wherever that is,

  • and change the contents by putting whatever characters the human has

  • typed in.

  • So this is very bad, because what I've not done

  • is build in space for the actual characters in Stelios' name or Maria's

  • name, or mine.

  • For that, what was the--

  • how can we actually get it that?

  • Well, let me open scanf 2--

  • also online to look at later--

  • and how have I mitigated this, at least in part?

  • Well, in this case, I'm saying mm-mm.

  • s is going to be an array of characters, specifically five of them.

  • And it turns out, because of the syntactic sugar, really--

  • and technically a more sophisticated feature than that--

  • I can pass in the name of an array to scanf.

  • Or any function that expects the address of something,

  • I can pass in the name of an array, and it will be treated as a pointer.

  • And so scanf will now put at that address whatever the human types in.

  • But there's a danger.

  • What could go wrong given that logic?

  • Yeah.

  • Exactly.

  • If you type in something long, something bad is going to happen.

  • When I declare s with char s bracket [5] close bracket, that means,

  • literally, give me an array in memory that

  • can hold five things, that therefore looks like this.

  • s, when passed into scanf, it's like saying,

  • OK, there's the address of that chunk of memory.

  • But if we type in Stelios' name at the keyboard,

  • and the programmer has only allocated enough memory for five

  • total characters, scanf's not going to know the difference,

  • because there's no backslash 0, there's no fanciness.

  • It's not yet a string.

  • It's a chunk of memory.

  • And so o and s and the backslash n are going

  • to end up in no man's land, memory you did not ask for.

  • Maybe you're tripping over Maria's name, or something else altogether.

  • And this is when computers crash.

  • You have an overflow of the buffer, or the chunk of memory,

  • that you've actually allocated.

  • And we can see this a little more playfully thanks to a friend of ours

  • at Stanford, Nick Parlante, who has spent quite a bit of time, I think,

  • with claymation.

  • And so in just a moment-- you're going to see the screen--

  • Nick Parlante at Stanford put together the following claymation

  • that shows us the little world of Binky, and takes a look

  • at some C lines of code and translates them into some claymation fun.

  • [VIDEO PLAYBACK]

  • - Hey Binky, wake up.

  • It's time for pointer fun.

  • - What's that?

  • Learn about pointers?

  • Oh, goodie.

  • - Well, to get started, I guess we're going to need a couple of pointers.

  • - OK.

  • This code allocates two pointers which can point to integers.

  • - OK.

  • Well, I see the two pointers, but they don't seem to be pointing to anything.

  • - That's right.

  • Initially, pointers don't point to anything.

  • The things they point to are called pointees, and setting them up

  • is a separate step.

  • - Oh, right, right.

  • I knew that.

  • The pointees are separate.

  • Er, so, how do you allocate a pointee?

  • - OK.

  • Well, this code allocates a new integer pointee,

  • and this part sets x to point to it.

  • - Hey, that looks better.

  • So make it do something.

  • - OK.

  • I'll dereference the pointer x to store the number 42 into its pointee.

  • For this trick, I'll need my magic wand of dereferencing.

  • - Your magic wand of dereferencing?

  • Uh, that-- that's great.

  • - This is what the code looks like.

  • I'll just set up the number, and--

  • [POP]

  • - Hey, look.

  • There it goes.

  • So doing a dereference on x follows the arrow

  • to access its pointee, in this case, to store 42 in there.

  • Hey, try using it to store the number 13 through the other pointer, y.

  • - OK.

  • I'll just go over here to y and get the number 13 set up,

  • and then take the wand of dereferencing and just--

  • [KLAXON]

  • Woah!

  • - Oh, hey, that didn't work.

  • Say, Binky, I don't think dereferencing y is a good idea, because, you know,

  • setting up the point is a separate step, and I don't think we ever did it.

  • - Mm.

  • Good point.

  • - Yeah, we allocated the pointer y, but we never set it to point to a pointee.

  • - Hmm.

  • Very observant.

  • - Hey, you're looking good there, Binky.

  • Can you fix it so that y points to the same pointee as x?

  • - Sure.

  • I'll use my magic wand of pointer assignment.

  • - Is that going to be a problem like before?

  • - No, this doesn't touch the pointees.

  • It just changes one pointer to point to the same thing as another.

  • - Oh, I see.

  • Now y points to the same place as x.

  • So wait, now y is fixed.

  • It has a pointee.

  • So you can try the wand of dereferencing again to send the 13 over.

  • - Uh, OK.

  • Here goes.

  • - Hey, look at that.

  • Now dereferencing works on y.

  • And because the pointers are sharing that one pointee, they both see the 13.

  • - Yeah, sharing, whatever.

  • So are we going to switch places now?

  • - Oh, look.

  • We're out of time.

  • - But--

  • - Just remember the three pointer rules.

  • Number one, the basic structure is that you have a pointer

  • and it points over to a pointee.

  • But the pointer and pointee are separate,

  • and the common error is to set up a pointer

  • but to forget to give it a pointee.

  • Number two, pointer dereferencing starts at the pointer

  • and follows its arrow over to access its pointee.

  • As we all know, this only works if there is a pointee, which kind of gets back

  • to rule number one.

  • Number three, pointer assignment takes one pointer

  • and changes it to point to the same pointee as another pointer.

  • So after the assignment, the two pointers

  • will point to the same pointee.

  • Sometimes that's called sharing.

  • And that's all there is to it, really.

  • Bye bye now.

  • [END PLAYBACK]

  • DAVID MALAN: So in a moment, we're going to transition

  • to one of our domain-specific problems, namely forensics

  • and the art of recovering information that may have accidentally

  • or deliberately been deleted.

  • But those of you in Elliott House might remember this shot of the back right

  • corner of the Elliott House dining hall, one of the undergraduate dining halls

  • here.

  • And this is where I was, like, 20 years ago, when I finally got pointers.

  • The key point being, it was not in the lecture hall,

  • and it was not the first time I sat down with a problem set related to pointers.

  • Because this topic in particular, in CS 50,

  • and specifically in this language C, is definitely tough for a lot of people.

  • And if you've totally followed everything thus far, that's great.

  • You're already ahead of where I was.

  • But more so than most any other topics in this course,

  • rest assured that if it takes a little while for this to sink in conceptually,

  • and for it to kind of come out of your fingers syntactically, that's normal.

  • Or at least if I'm normal, it's normal, because that was precisely

  • my experience.

  • And I mean it.

  • It sunk in literally in this location, when my teaching

  • fellow at the time, Nishat Meda--

  • who was a year or two older, I think a senior

  • in Elliot House, which is why we were there--

  • he was walking me through it again, maybe the second or the third time.

  • And then finally, that proverbial light bulb went off.

  • And so keep in mind, more so than most any other topic,

  • that that here too might be your experience as well.

  • So here, of course, is Zamyla, one of our senior-most staff

  • who appears in the course's walkthroughs for various problems.

  • And you'll notice that it seems to be a pretty high quality

  • photograph, or frame from a video.

  • But it seems to be the case that in Hollywood, especially--

  • you can always do what's called enhance, which

  • is the verb that, whenever you want to find out who done some crime,

  • you enhance the image, and there he or she is

  • in the glint of someone's eye or contact lens, or something like that.

  • So this affords us an opportunity, nicely enough,

  • to continue along a trajectory that we've

  • been on the past few weeks, whereby with problem set two

  • we looked at the underlying representation of strings

  • via cryptography and Vigenere and Caesar and so forth.

  • And then this current week, you're focusing on music and the underlying

  • representation of sounds.

  • And this coming week, we focus on the underlying representation

  • of imagery, of which this is of course just one example.

  • But it's worth noticing that if Zamyla is suspected of something,

  • and we decide-- or rather--

  • we've got to spin it the right way.

  • If Zamyla witnessed something, and we want to see the glint of her eye

  • who done it, this is about all you're going to see,

  • because just as in our computers, where we

  • have a finite amount of RAM or hardware, so in an image is there

  • a finite amount of information in the zeros and ones that

  • compose the GIF, or the PNG, or the JPEG,

  • or whatever the file format actually is.

  • And indeed, if you zoom up closely enough on this photograph of Zamyla,

  • you will see the so-called pixels, or dots,

  • because an image is just a bunch of rows and columns of dots, each of which

  • is colored.

  • And I'm hard-pressed to imagine the reflection of any crime being committed

  • based on the glint of someone's eye.

  • And yet, nonetheless, if we could dim the lights for a few seconds,

  • there are such claims in Hollywood as these.

  • [VIDEO PLAYBACK]

  • - He's lying.

  • About what, I don't know.

  • - So what do we know?

  • - That at 9:15, Ray Santoya was at the ATM.

  • - The question is, what was he doing at 9:16?

  • - Shooting the 9 millimeter at something.

  • Maybe he saw the sniper.

  • - Or was working with him.

  • - Wait.

  • Go back one.

  • - What do you see?

  • - Bring his face up.

  • Full screen.

  • - His glasses.

  • - There's a reflection.

  • It's [INAUDIBLE] Vita's baseball team.

  • That's their logo.

  • - And he's talking to whoever's wearing a jacket.

  • [END PLAYBACK]

  • DAVID MALAN: OK.

  • So that's not happening.

  • This is all that is in the information.

  • And so this is a wonderful opportunity, in a pretty cool-sounding and actually

  • cool world of digital forensics, for us to bridge a couple of our worlds now.

  • Up until today, we've really only had arrays

  • as our only data structure, the only way we

  • could lay out more than just individual types of memory.

  • But once we have arrays, that allowed us to start talking about strings.

  • And it turns out, strings can be represented really as just addresses.

  • And so that now that we have addresses, we

  • can kind of stitch together any kind of structures we want in memory.

  • Not everything has to be back to back to back.

  • And this means we can now start to think about how we would store information

  • permanently on disk, on your hard drive of a computer, how you

  • can solve problems more efficiently.

  • And we'll explore that by first considering how you

  • represent even information like this.

  • Well, let's try to take away some of the complexity of Zamyla's photograph here,

  • and just boil her down to this happy smiley face.

  • This is perhaps the simplest image we could create using zeros and ones,

  • because if you think of white dots as just being represented by ones,

  • and black dots as being represented by zeros-- or vice versa,

  • it doesn't really matter--

  • you could construct an image from a grid of bits, zeros and ones,

  • if you just interpret them and present them on the screen

  • according to some pattern or rule of thumb like that.

  • And here, then, is an image.

  • It's got rows-- or scan lines, as they might be called--

  • and columns that represent the individual dots, or a monitor's

  • interpretation of those zeros and ones.

  • Now, Zamyla, of course, has many more dots than just the couple of dozen

  • we saw on the screen before.

  • But it's the exact same idea, because, indeed,

  • if we zoom in even on the finest detail of Zamyla's photograph here,

  • do we actually see the actual dots.

  • But instead of just using an individual bit,

  • one and zero-- because if you've only got one bit per pixel, or per dot,

  • you can pretty much do no better than two colors,

  • black and white, red or blue, or whatever two colors you want.

  • You got to pick something.

  • And that's when you have very limited colors, or black and white.

  • So in Zamyla's case here, turns out this is a JPEG.

  • Pretty common for photographs and the like.

  • Typically as many as 24 bits are used for every dot.

  • And with 24 bits, you can express any number

  • of millions of numbers, which means you can

  • get light blue, dark blue, hot pink, purple, and any number of colors

  • in between from the rainbow, because you have

  • so much expressiveness with that many bits,

  • way more, certainly, than zeros and ones.

  • Turns out, JPEGs are a little special.

  • Any JPEG, at the end of the day, is just a file on your hard drive,

  • or an attachment in an email.

  • And a file is just a pattern of zeros and ones.

  • But what does it mean to be a JPEG?

  • What does it mean to be a Word document or an Excel file?

  • Well, generally, some human or some company

  • decided that a file that is a JPEG shell always

  • start with the same few bits, the same pattern of bits.

  • Or maybe, equivalently, a JPEG shall always end, at the end of the file,

  • with the same pattern of bits.

  • Or with something like an Excel file, there

  • is something that Microsoft decided that they decided how to represent rows,

  • how to represent columns in the file.

  • And the only way you can write software that

  • reads Excel files is if you read the documentation

  • Microsoft wrote so you know how to interpret zeros and ones.

  • Remember, everything is context-sensitive.

  • In another context, these zeros and ones might

  • be like a secret message written in ASCII,

  • if you interpret these as ASCII codes.

  • But if you opened this instead in Photoshop or something,

  • maybe those same zeros and ones are presented as an image.

  • But it turns out, then, if you open a file that is of type JPEG--

  • so something.jpeg, Zamyla.jpeg--

  • the first three bytes of that file, by definition of a JPEG,

  • per its documentation, per the Wikipedia page, are that the first three bytes--

  • or the first 8 plus 8 plus 8, 24 bits--

  • should be these three numbers in decimal.

  • 255, 216, and 255.

  • Those are going to be zeros and ones at the end of the day,

  • but it's annoying to look at zeros and ones.

  • Decimal is a little more familiar.

  • So it's these three bytes start every JPEG.

  • Now it turns out, in the world of files--

  • which, again, will be one of the domains for the coming week--

  • people don't really use decimal, just because,

  • and people definitely don't use binary, because it's

  • just way too hard for humans to wrap their minds around typically.

  • They instead use something called hexadecimal,

  • which is a little weird-looking, but you've probably

  • seen it in various contexts.

  • Hexa means 16, in this case.

  • And you have literally 16 characters in this alphabet, 0 through 9,

  • and when you can't count higher than 9, you go to a, b, c, d, e, f.

  • So 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a is 10, b is 11, dot dot dot, f is 15.

  • And so that's hexadecimal.

  • But it's the same idea.

  • If we can do the whole columns and ones place and 16s place and so forth,

  • same idea as week 0, but we have a different alphabet.

  • Now just so you've seen it, particularly for the coming week's problems,

  • know that those three bytes--

  • 255, 216, and 255--

  • in binary happen to be this.

  • Now, what's nice about hexadecimal is that each of those characters,

  • 0 through 9 and a through f, technically represent four bits.

  • Why four?

  • Well, long story short, if you have four bits, that's this many patterns.

  • 2 times 2 times 2 times 2.

  • That's 16, ergo hexadecimal.

  • So long story short, hexadecimal is just super convenient

  • because you can take 8 bits, or a byte, and separate them

  • into two hexadecimal values, each of which is four bits.

  • So with that said, how do I represent these--

  • 1111, I said it's 15, and again, if we do the week 0 math,

  • 1111 will equal 15, or f.

  • Meanwhile, 1101 will equal d, and 1000, per week 0, is 8, and so forth.

  • So this is just a very methodical way of taking decimal and presenting it

  • instead as hexadecimal.

  • And humans decided years ago that it's not always obvious

  • when you're looking at hexadecimal that it is hexadecimal.

  • So if humans decided that any time we humans write something in hexadecimal,

  • we're usually going to prefix it, whatever

  • we're writing, with 0x, just because.

  • It's like a visual cue to, hey human, here comes hexadecimal.

  • So with that said, these now are the first three bytes

  • of any JPEG in a file.

  • This is relevant, because among the problems for the coming

  • week are going to be this.

  • We are going to give you what's generally

  • thought of as a forensic image, a copy of a digital camera's memory card,

  • on which are 50 photographs, all of which

  • have somehow been accidentally or maybe deliberately deleted.

  • But it turns out, because each of those 50 photographs are all JPEGs,

  • and therefore all of them start with the same pattern of bytes,

  • if you have a mechanism for reading over that file's zeros

  • and ones, one byte at a time, you could notice, ooh, I just saw ffd8ff.

  • This is probably the start of one of the 50 lost photographs.

  • Let me actually extract that and a whole bunch of bytes

  • afterward, and save them to a temporary file.

  • And the next time I see this pattern, ooh, maybe

  • that's the second photograph that's been lost on the digital camera.

  • Let me write that out to a file.

  • And so among the coming challenges is going to be to reintroduce you to JPEGs

  • and have you recover 50 photographs that might have been

  • accidentally or deliberately deleted.

  • But there's other graphical file formats in the world,

  • and one of the earliest common ones was bitmap, BMP.

  • And as the name suggests-- it's kind of a perfect name for a graphic file,

  • because it's a bit map, a map of bits, like the ones and zeros that

  • compose Zamyla's black and white happy face just a moment ago.

  • With the bitmap file format-- you might be familiar

  • if you ran Windows XP for some time, if you grew up

  • with this loading up on your screen.

  • This was a bitmap file, called a wallpaper more generally.

  • And bitmaps clearly are supportive of color.

  • Some of them are bits per pixel.

  • And what's nice about bitmaps is that they're a relatively simple file

  • format.

  • And I say relative because you'll see there's some complexity,

  • but they're way simpler than JPEGs actually are

  • beyond those three knows characters.

  • As an aside, I thought it's always fun to take

  • a look at what this beautiful field looks like some 20 years later.

  • That is the same thing that you might have grown up with on your wallpaper

  • instead now.

  • But here is-- at first glance overwhelming, but in

  • the problem set more methodically presented,

  • what's an example of a file structure.

  • So what does it mean to be a file?

  • At the end of the day, every file on your computer is just a thing

  • containing zeros and ones.

  • But again, those zeros and ones are laid out in patterns.

  • JPEGs, for instance, start with 255, 216, 255.

  • Microsoft Word files, Excel files, anything else,

  • is going to start with different patterns as well.

  • And this is the formal way, from Microsoft's own documentation,

  • for what a bitmap file looks like.

  • And we won't dwell on the low-level details here in class,

  • but long story short, this documentation tells us that the first couple of bytes

  • are something called a word, and the next few bites

  • are something called a dword, or double word, and so forth.

  • So long story short, if you are an aspiring programmer and you

  • want to write a piece of software that reads and writes, opens and saves

  • graphics files, you have to read documentation like this

  • and understand in what order you read zeros and ones in,

  • and what order you write them out in.

  • Because if it's just a whole bunch of zeros and ones,

  • you don't know what the heck you're looking at.

  • You need to reference documentation like this to know that, oh, it's a number.

  • Oh, then it's a character.

  • Oh, then it's a whole word.

  • Oh, then it's a bunch of red pixels or green pixels or blue pixels.

  • And indeed, if you look at the bottom here,

  • what's nice enough about this file format is that once you go through some

  • of these header fields, as they're called--

  • more on that in the problem itself--

  • you get to something a little more familiar.

  • If you grew up ever hearing an expression RGB, red, green, blue--

  • using those three colors, you can combine them like waves of light,

  • or paint if you will, into any number of colors of the rainbow by mixing

  • in a little bit of red, a lot of blue, no green,

  • just like we did a few days back when we looked

  • at that weird representation of yellow in the context of a graphics program.

  • But what's in a bitmap file at the end of the day,

  • wonderfully, is a whole bunch of our RGB triples,

  • which essentially say top left pixel on the screen will be this amount of red,

  • this amount of green, this amount of blue.

  • Then next in the file is whatever the color is for the second pixel, and then

  • the third pixel, and then the fourth pixel Essentially

  • a row, a row, a row that compose the actual image.

  • And it's a little fancier than that, but that's the essence.

  • It's a whole bunch of bits that represent some color of the rainbow.

  • But unfortunately, we haven't seen any way

  • of dealing with a structure like this, because all we have arrays.

  • We have no data type called file yet--

  • until today.

  • So it turns out that there is a keyword in C called struct,

  • and as the name suggests, it gives structure to data.

  • And indeed, one of my claims at the start of today

  • was, we want to eventually have data structures in our toolkit.

  • Arrays are not sufficiently powerful to solve real interesting problems

  • real effectively.

  • So it turns out, there is no data type in C for the notion of a student.

  • There's int, and there's floats, and there's chars,

  • and there's kind of sort of strings, but apparently not even those exist.

  • Those are just char stars.

  • But wouldn't it be nice if implanting a piece of software

  • that your friends can use online, or that the registrar can

  • use to register students, you actually could declare your own data type

  • called student?

  • And indeed, you can.

  • And over the course of this problem set and others,

  • we will introduce syntax like this.

  • There's a keyword called typedef and struct

  • that together let you invent, just like in Scratch, your own puzzle

  • piece, or your own data-type here, called student in this case.

  • And as you may imagine, a student might have

  • associated with him or her a name, and a dorm, and probably other fields too.

  • But this is a way of inventing your own data type,

  • calling it something very reasonable and obvious to you,

  • and including inside of it, or encapsulating inside of it,

  • two other known data types, like two strings or two integers,

  • or 10 integers, or any number of other things.

  • And so simply by using this syntax now can

  • we build up things that aren't just arrays,

  • but have more meaningful data types associated with them, all of which

  • are going to be grouped together.

  • And among the things you'll see in problems

  • set four, moving forward in its problems,

  • is an ability to not only declare data structures like this,

  • but also read them from disk, read them from files,

  • and then write them to files as well.

  • And indeed, what you'll do in one of the examples

  • as well is manipulate bitmap files, take one as input and then resize it.

  • Make it much bigger, but scale it all proportionately.

  • Or maybe shrink it, and if you shrink it,

  • figure out what information you throw away

  • and what information you actually retain.

  • And in one third problem, still, who done it,

  • you'll be presented with an image with a whole lot

  • of random noise, a lot of red speckles, much

  • like you might have seen in a child's puzzle book years ago.

  • And only if, in the real world, if you hold up

  • one of those red cellophane pieces of plastic, can

  • you kind of see the magic image behind it.

  • You'll implement the notion of that red filter

  • to reveal who done it in the context of that example.

  • But I thought we'd end today not only with that teaser

  • but with also this reassurance that perhaps certain shows get

  • these kinds of details more right than others.

  • [VIDEO PLAYBACK]

  • - Magnify that death sphere.

  • Why is it still blurry?

  • - That's all the resolution we have.

  • Making it bigger doesn't make it clearer.

  • - It does on CSI Miami.

  • [SIGH]

  • [END PLAYBACK]

  • DAVID MALAN: All right, that's it for lecture four.

  • We will see you next time.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it