Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • [VIDEO PLAYBACK]

  • - --we know?

  • - That at 9:15, Ray Santoya was at the ATM.

  • - So the question is, what was he doing at 9:16?

  • - Shooting the nine-millimeter at something.

  • Maybe he saw the sniper.

  • - Or he was working with him.

  • - Right.

  • Go back one.

  • - What do you see?

  • - Bring his face up.

  • Full screen.

  • - His glasses.

  • - There's a reflection.

  • - That's Neuvitas baseball team.

  • That's their logo.

  • - And he's talking to whoever is wearing that jacket.

  • - We may have a witness.

  • - To both shootings.

  • [END PLAYBACK]

  • DAVID MALAN: This is he is CS50, and this is lecture 3, and that

  • is not how computer science works.

  • And indeed, by the end of today, we'll make

  • clear exactly what's right, what's not right about that,

  • and hopefully give you some pause any time you watch TV or movies hereafter

  • and notice these little things that all too many writers seem

  • to take for granted.

  • So recall that last time, we took a look lower level at what compiling actually

  • is.

  • And recall that it was a few things, these four steps of pre-processing

  • and compiling and assembling and linking,

  • so that when you start with their source cod,

  • that might look like this code that we have written in the past,

  • you first have to preprocess it, and the first step in pre-processing was

  • converting all of those processor instructions--

  • anything starting with a hash at the beginning-- to their equivalents.

  • So opening the files and effectively copying and pasting the contents

  • there so that programs and the compiler know what get_string

  • is and know what printf is.

  • The next step that came after that was actually

  • compiling, whereby compiling technically means taking that source

  • code, once it's been preprocessed, and printing and generating

  • this very cryptic-looking stuff called assembly code.

  • And those assembly codes or assembly instructions are really what the CPU--

  • the brain of your computer-- actually understands,

  • although technically the computer understands them only in the form

  • of 0's and 1's.

  • And so when you "assemble-- step three--

  • that assembly code, you actually get out those 0's and 1's.

  • But even that simplest of programs where we just prompt the user for a string

  • and then print out their name still involved a couple more files.

  • There was not only cs50.h and stdio.h at the top,

  • somewhere in the computer system there's probably files called cs50.c,

  • and in the case of stdio, printf.c, in which actually the code is

  • for those two functions, those two have to get compiled down

  • to 0's and 1's, and then we need to link everything together,

  • merging those 0's and 1's so that the computer has access to your code

  • and to printf's code and to the cs50 library's code And so forth.

  • But all of that we can just generally wrap up in the descriptor of compiling.

  • And so that's one of the looks we took last week.

  • And we also have introduced, last week and previously, a few tools.

  • And odds are, you're having as many frustrations perhaps already

  • with the p-sets as you are accomplishments

  • and sense of satisfaction.

  • And that's normal, and rest assured that the scales will eventually tip more

  • toward happiness and away from sadness, but we'll

  • give you indeed more tools today than these for actually finding

  • problems or shortcomings in your code.

  • help50, recall, helps you with what process?

  • When you instinctively consider using help50?

  • When you see error messages on the screen.

  • Something you don't understand that's the result of some mistake you

  • probably made but you don't quite understand what the computer is telling

  • you, run help50, and then that same command and we, the staff,

  • with our code will try to understand the message for you

  • and provide you with feedback.

  • style50 does exactly that.

  • It helps you see with red and green color coding exactly what spaces should

  • be there, shouldn't be there-- it just helps you pretty

  • your code so that you can read it better and other humans can as well.

  • And then printf, which is kind of like the coarsest tool in your tool box,

  • this is just helping you see not only messages you want to see,

  • but just the values of variables.

  • You can print ints and strings, whatever you want,

  • and then you can delete those lines of printf

  • once you're confident your program's working.

  • But that gets a little tedious, and honestly, as our programs get bigger,

  • we're going to want more powerful tools than like manually printing things

  • out, recompiling, rerunning, it very quickly it gets tedious.

  • And the goal of programming is not to be tedious, but to be empowering,

  • and that's where we'll step to today via this.

  • So CS50 IDE is sort of fancier version of what

  • you've been using called CS50 Sandbox, and in turn, CS50 Lab.

  • Now recall that both of those tools, the Sandbox and the Lab,

  • have a terminal window where you can type commands,

  • they have a code editor where you can actually write your code,

  • and then they have a file browser with icons and such

  • where you can actually see your files and folders.

  • So it turns out that CS50 IDE is another tool that at first glance

  • is very, very similar, even though it's laid out a little differently,

  • but it has as many features as the Sandbox and the Lab, but some more.

  • More features that actually help you solve problems in your code

  • and even collaborate come final project time with others if you would like.

  • So this we'll see is this is the CS50 IDE.

  • It comes with the so-called night mode so you

  • can make everything a little darker on your screen, especially if p-setting

  • at night, and let's actually take a look then

  • at what you can do with this kind of tool.

  • When you log into this tool for the very first time in the next problem set,

  • you'll see an interface that's almost the same as before.

  • The colors are a little different, the font sizes are a little different,

  • but at the bottom by default, you have your so-called terminal window,

  • though instead of the dollar sign now, you'll

  • see a little more detailed workspace, but more on that in a bit.

  • Up here you just have the code editor window,

  • nothing's really going on there.

  • And then we have the added feature of Ceiling Cat

  • in the top right-hand corner.

  • And we'll also see some other features along the way.

  • So let's actually write a program in CS50 IDE, which, to be clear,

  • is just another web-based programming environment that also gives you

  • access to your own cloud-based server.

  • It, too, is running Ubuntu Linux, which is a popular operating system that

  • is not macOS and it's not Windows.

  • But unlike the sandbox environment where you don't even log in

  • and you lose your files eventually, as you

  • may know from when your cookies are lost or something goes wrong,

  • the IDE saves everything.

  • And you'll log in with your account, and whatever

  • you put there last week is going to be there this week and next week

  • and beyond.

  • So let me go ahead up to File, New File, or I could just click this little plus

  • icon in the top right-hand corner, and let me go ahead and preemptively hit

  • Control-S or Command-S or go to File, Save--

  • you should find the interface very similar to any Mac or PC program--

  • and let me go ahead and save this file as follows.

  • I'm going to call this hello.c.

  • And it's important to mention the file extension,

  • otherwise the IDE, like the Sandbox and the Lab,

  • won't know what type of program you're writing.

  • And then let me go ahead and just write my simplest of programs.

  • So let me go ahead and include stdio.h, int main void.

  • Let me go ahead and open my curly braces, printf--

  • hello, world, backslash n, and a semi-colon.

  • So you'll notice that almost everything is the same.

  • The colors are a little different, perhaps,

  • and you might see some different assistive

  • features as you're typing your code, but the end result is the same.

  • And the color coding you just get for free because it's helping

  • draw your attention to different parts of the code.

  • Let me go ahead now and--

  • oh notice this.

  • There's one difference.

  • The IDE is a more powerful tool, but as such, it's a more manual tool

  • and it's not just going to auto-save your code for you.

  • Nice as that's been with the Sandbox, such that you'd never

  • actually had the hit Command-S or Control-S--

  • and if you were, you didn't need to be, the IDE

  • is only going to save things when you want it to so that nothing

  • will happen magically anymore.

  • So what I'm going to have to do is go back up here, File, Save, or Command-S

  • or Control-S, you'll see a little green dot

  • briefly, and now and back at my prompt.

  • I'm going to go ahead now and type my familiar command, make hello, Enter,

  • and you'll see pretty much the same cryptic-looking client

  • command as before because the IDE is configured quite like the Sandbox.

  • And if I want to go ahead and run this now, how do I run this program?

  • Quick check?

  • ./hello, it's exactly the same as before.

  • ./hello, and there we have it, hello, world.

  • So long story short, the user interface thus far is a little different,

  • but functionally it's the same.

  • We're just going to now start to see some more features.

  • So what are those features?

  • And let's introduce new some capabilities that were actually

  • possible in the Sandbox, we just didn't really introduce them at the time.

  • If I click this folder icon at top left, you'll see all of my files and folders.

  • And today for lecture I have a lot of pre-made examples

  • that are already on the course's website, some of which we'll look at,

  • some of which we'll refer to the website,

  • but these are just familiar files and folders.

  • And you can see that everything in my account

  • is apparently in something called Workspace, which

  • is just a folder, name, or a directory.

  • Here's my sc3 directory, which again, comes

  • from the website for today's lecture, lecture 3.

  • And then here's the file I just compiled in the program and the file

  • that I wrote, hello.c.

  • You'll notice too that there's this funky symbol here, tilde,

  • that you might not have occasion to write often in English,

  • but in Spanish in other languages you might use this character.

  • This is actually a shorthand notation for what's called your home directory.

  • In this environment, CS50 IDE, you have your own home directory, which

  • means your folder of files and other folders that you get to create,

  • you own, and that persists every time you log in-- you're not

  • going to lose the contents therein.

  • So this just means that in your home directory, a.k.a. tilde,

  • there is a folder called workspace in which I'm currently working.

  • And that's just one folder in which all of my work is going to be done,

  • because there's so many other files and folders in this cloud environment,

  • just like there are in your Mac and PC, we just generally

  • don't care what they are.

  • But notice what we can do at this terminal window besides compile

  • and run code.

  • There are other commands.

  • For instance, this blue text here, similarly to the file browser up top,

  • indicates now not just that this is my prompt per the dollar sign,

  • but that in my home directory's workspace directory.

  • So that means I can be elsewhere even though I haven't

  • specified where I want to go yet.

  • And in fact, I can do this. ls stands for list,

  • it's just shorthand notation for that.

  • And now I see a textual version of my file tree, so to speak.

  • So you'll see here, sc3 is a folder, and you

  • can tell as much because there's a slash at the end of it.

  • hello.c is of course the file I wrote a moment ago.

  • And then hello in green is my program that I compiled, and the star

  • or asterisk there is just--

  • it's not the name of the file, it's just indicating to me

  • visually that that is executable.

  • That's a program I can run just so I know what's compiled

  • and what maybe is source code.

  • So when you're running ./hello, the reason all this time this has been

  • working is because in dot, your current folder, there is a file called hello,

  • and when you hit Enter, you are running that program there.

  • So if after today you go back onto CS50 Sandbox or CS50 Lab and type ls,

  • you'll see exactly the same thing as you might by the little folder

  • icon in those programs as well.

  • But suppose I want to go into a directory.

  • In macOS or Windows or even the IDE, I could, of course,

  • go my File icon, and then per the little triangle

  • here, which might seem intuitive, you just click it

  • and you can see what's going on inside, not surprising.

  • But how do you do that textually?

  • At a command prompt, well it's not all that hard.

  • You just need to change your directory.

  • So if I do cd space sc3, Enter, nothing seems to happen quite yet

  • except that my prompt changed.

  • Here's the indication that-- this is my prompt, but to the left of it

  • you see in blue that I'm now in my home directory's workspace folder,

  • in my sc3 folder there.

  • So it's just a text-based version of the GUIs, the Graphical User Interfaces

  • that all of us have certainly come to take

  • for granted in the world of macOS and Windows thus far.

  • Well, suppose that I'm a little done with my hello program

  • and I want to delete it.

  • Well in the IDE, like in the Sandbox, you can actually go up here and you can

  • click on it, and then you can typically right-click or control-click,

  • and you'll get a whole menu of other options, one of which is Delete--

  • and feel free to tinker like that in your own environment.

  • But what about the command line?

  • If I zoom in down here and I want to remove hello, you're

  • not going to type remove because that just feels a little verbose

  • and humans decades ago decided that's too tedious to type,

  • let's just call this command rm--

  • for remove-- hello, you're going to see a somewhat cryptic prompt.

  • rm-- remove regular file 'hello?'

  • This is more arcane than it needs to be, but it's just asking,

  • are you sure you want to delete 'hello?'

  • Then it's just waiting for you.

  • And here you can type y or yes or sometimes other commands too,

  • now I've confirmed that my intentions were yes.

  • If I type ls again, I-- whoops, in the wrong folder.

  • If I type ls again after doing hello--

  • no-- after doing hello and do ls, now I'll

  • see just those two things-- sc3 and hello.c.

  • What if I want to make a folder?

  • Well notice this.

  • If I type at the bottom here, make directory--

  • mkdir-- test just to make a test folder, I'm

  • about to hit Enter, but watch the top left-hand corner

  • where I currently have those other files and folders, and when I hit Enter,

  • now I have a test folder.

  • So these things are identical.

  • One is graphical, one is command line, and there's even other commands

  • if I decide I don't want that.

  • rmdir is remove directory, and it just goes away

  • because it's empty and thus safe.

  • Any questions then on any of those commands

  • or just the overall layout of what it is we're looking at?

  • All right, so don't get hung up on any of those commands,

  • and the problem set and beyond will always

  • remind you of those kinds of features.

  • The point for now is just that we're in a somewhat new environment,

  • but it's fundamentally still the same, it has the same capabilities.

  • So what are other tools we looked at?

  • So you might have heard rumors about a tool called check50, and indeed,

  • this is a tool that the staff use to evaluate problem set 1 and problems set

  • 2 to evaluate the correctness of them so that we ourselves don't have to type

  • ./mario or ./caesar again and again and again to test students' code.

  • But starting this week, you, too, have access to the same program.

  • check50 is a command from the staff that checks the correctness of your code

  • just like style50 checks the style of your code.

  • And in fact, if I go back over to my IDE,

  • let's try to use this for the first time by making the same version of hello

  • that you did perhaps for your first problem set.

  • So if I go ahead and include not just stdio, but cs50.h,

  • and I go ahead and get a string from the user

  • with get_string, prompting them for their name, and then go ahead

  • and print not just hello, world, but hello, percent s comma name,

  • this I believe was the same program you yourselves probably

  • wrote, or some variant thereof.

  • So if I go ahead now and test this myself--

  • make hello, Enter, seems OK, ./hello.

  • I'm going to go ahead and type in my name, and voila, hello, David.

  • Now suppose you're feeling pretty good, you're

  • pretty confident that your code is correct,

  • and most importantly, you have tested your code yourselves.

  • It's not sufficient to rely on our tool alone

  • to test your code because it, too, might not be exhaustive.

  • So once you've tried a few inputs, not just David, but perhaps

  • Veronica's name as well, seems to work.

  • Brian's name as well, seems to work.

  • No name at all, doesn't seem to work, maybe?

  • But we'll have to look back to the problem set

  • to see if that's actually a problem.

  • Let me go ahead now and run check50.

  • check50 expects a special slug, so to speak.

  • Just a unique identifier for the problem that you want to check.

  • And you would only know this from reading a problem

  • set or a documentation online.

  • I just happened to recall that the command that the staff had been using

  • to grade and evaluate hello is just cs50/2018/fall/hello.

  • And the slash is to just kind of visually distinguish those words,

  • this isn't a folder or files or anything like that in your own account.

  • So I'm going to run check50 cs50/2018/fall/hello in the same

  • directory that hello.c is in.

  • Enter.

  • It's going to go ahead and connect to GitHub, which is the backend,

  • recall, that we use for storing your code.

  • It's authenticating me now, which means what's your username and password?

  • I'm going to go ahead and use one of my test accounts.

  • And now it's prompting me for my password,

  • and I'm going to go ahead and type that in.

  • You'll notice you're seeing stars like you see bullets in a website

  • just so that someone looking over your shoulder can't see what you're typing.

  • Now I'm going to go ahead and watch the progress.

  • It's preparing, let me go ahead and zoom in.

  • Dot-dot-dot.

  • It's looking at my code, it's getting ready for submission,

  • it's now uploading it to GitHub.com, and once it's on the servers,

  • then it's going to tell CS50 server, here is so-and-so's submission,

  • go ahead and run a few automated tests on it,

  • checking therefore its correctness, and hopefully we're about to see some

  • green, happy smiley faces, and voila, yes,

  • it looks like this check50 command for this problem--

  • or slug, so to speak--

  • checked that hello.c exists, because if I forgot to write the file

  • or if I misnamed it, nothing's going to work.

  • We checked that it compiles successfully,

  • so that, too, is a happy green face.

  • Then it apparently checked--

  • what if we type in Veronica?

  • Do we see hello, Veronica?

  • Apparently yes.

  • What if we typed in another word, Brian?

  • Yes, apparently we say hello, Brian.

  • And so with high probability, we're going

  • to conclude, based on those four tests, that your code is, in fact, correct,

  • at least with respect to those inputs.

  • And there's often some more detail via URL at the bottom

  • where you can actually see more graphically just more

  • feedback on your code.

  • Of course, the first time, second time, third time maybe you run this command,

  • you might not see some green happy faces,

  • you might see some red unhappy faces or some yellow flat faces,

  • which just means we couldn't even run the checks because something else is

  • wrong.

  • But over time, this will help you feel more comfortable and more confident

  • that your code's correct before you actually use submit50 and submit.

  • Going into it you'll feel a little better or a little frustrated

  • to know in advance-- wait a minute, I'm about to submit this but nope,

  • it's not yet correct.

  • So realize it's a two-edged sword.

  • Any questions about check50 or any of these commands thus far?

  • Anything at all?

  • No?

  • All right.

  • So let's take a look at the final and most powerful

  • tool now available to you in the IDE environment.

  • Built in to CS50 IDE, which stands for Integrated Development

  • Environment, which isn't a CS50 thing-- this is a common term in industry

  • for tools that make it easier to write code,

  • it turns out that there's some other feature besides the cat over here.

  • Namely, one, you can share your workspace

  • with teaching fellows and course assistants

  • so they can perhaps help you in real time a la Google Docs, even chatting

  • with you in real time.

  • But it also provides you with what's called a debugger.

  • A debugger, as the name suggests, removes bugs--

  • or rather, helps you remove bugs from your code

  • by allowing you to not just resort to printf--

  • printing out ints and strings and whatever

  • is good that's going on your program, it kind of automates

  • that very tedious process for you.

  • And it lets you walk through your code one

  • line at a time at your own comfortable pace

  • and see along the way all of the values of your variables in that program.

  • To activate this debugger, I'm going to go ahead and do the following.

  • I'm going to compile my code as always with make hello.

  • It has to compile, otherwise I might want

  • to use help50 and figure out why it's not compiling,

  • but it does seem to have compiled.

  • And now I'm going to go ahead and run debug50, space, and then

  • the name of the program I wanted to debug.

  • And the name of the program I wanted to debug at the moment

  • is the current directory's file called hello.

  • Let's assume that there's perhaps something wrong with it.

  • The first time I run this command, though, debug50

  • is not going to be happy with me because it's going to say,

  • it looks like you haven't set any breakpoints.

  • Set at least one breakpoint by clicking to the left of a line number

  • and then rerun debug50.

  • Well what is a breakpoint?

  • Well as the name kind of suggests, it allows

  • you to break or pause the running of your code at any of your lines.

  • And all this time for the past few weeks,

  • your code been automatically line-numbered.

  • And this is useful because the most interesting line in this program,

  • once it really gets going, isn't this stuff at the top,

  • it's not int main void, right?

  • That's all copy-paste from past programs.

  • It's really the sixth line here where I actually have some logic of my own.

  • And so in CS50 IDE, what you can now do is

  • click to the left of one of these line numbers,

  • a little red light like a stop sign is going to appear saying,

  • break or pause my program on this line so

  • that I can poke around my actual code.

  • Sandbox and Lab cannot do this.

  • So now I'm going to go ahead and rerun debug50 in exactly the same way, hit

  • Enter, but now I have one breakpoint.

  • And you'll see on the right-hand side a fancier menu just popped up

  • by the cat that provides me with a bunch of features.

  • And at first glance, frankly, it's a little overwhelming

  • because there's a lot going on here, but you'll notice first,

  • and most importantly, there's some mention of my name variable.

  • I don't quite understand 0x0 or whatnot, but I do understand string.

  • And so what the debug50 program has realized is oh, on this line and below,

  • you have a variable called name.

  • It doesn't seem to have a value yet.

  • 0x0, it turns out, is just going to mean empty or null or 0.

  • But that's good, because now, when I actually execute this line,

  • hopefully it's going to take on the name David or Veronica or Brian.

  • So let's see what happens.

  • Notice that it's highlighted in yellow, line 6, which means it

  • has not yet executed this line of code.

  • My code has paused at this point because I set that breakpoint.

  • And then notice kind of like a music player up here, there's a few icons.

  • The Play button is just going to say, ah, play my program,

  • run it all the way through the end, kind of like scratch with the green flag.

  • But more powerful is this.

  • You can step over this line, therefore executing it just once.

  • If it's a function, you can step into this line

  • and actually look inside of a function that you're using, like get_string,

  • or you can step out of another function, but more on that another time.

  • So what I'm going to do is this.

  • And the button I'm going to click most commonly when trying to understand

  • how my program is working is this--

  • Step Over.

  • So it's the second icon from the left, right next to the triangle.

  • So once I click this, watch what's going to happen,

  • even though it's a little small, on the right-hand side for my name variable.

  • Notice that I'm being prompted to type in my name because the program

  • is still running in my terminal window, but when I hit Enter now,

  • providing my own name, automatically you see on the right-hand side

  • that this name variable has a value now of, quote-unquote,

  • "David" of type string.

  • There's this 0x1083010-- more on that later, just a little cryptic,

  • but I didn't have to use printf now, I can actually see what's going on.

  • Now you can see that line 7 is highlighted,

  • because I set a breakpoint above it, so now I'm on the second line

  • because I just stepped into it.

  • Let me go ahead and click Next again, and you'll

  • see that in my terminal window, hello, David just got executed.

  • And now if I just keep going, it's going to go ahead and run to the end

  • and close the debugger.

  • So not all that useful for this program because frankly, I'm

  • pretty sure this is correct, but the power of debug50 and a debugger more

  • generally is that it lets you, whether you're less comfy or more comfy,

  • walk through your own code at your pace just like a TF or a CA might say, OK,

  • what is this line doing?

  • What is this line doing?

  • You don't have to resort to printf, you can just very methodically

  • walk through your code and find that damn bug that's been bothering you

  • for minutes or even hours.

  • So henceforth, any time you have a bug in your code that is compiling

  • but it's just logically incorrect-- the pyramid in Mario isn't quite right,

  • your encryption of Caesar isn't quite right, or something else,

  • your first instinct now should be, let me compile it, run debug50 on it,

  • and just step through the code, setting a breakpoint wherever I want,

  • so you focus on just a few lines, not the whole thing--

  • like I just did--

  • and see if you can figure out logically when a value is not what you expected,

  • then oh--

  • go ahead and just click Resume, fix the bug, and retry.

  • Such a powerful tool.

  • Any questions?

  • Yeah?

  • What is it?

  • AUDIENCE: What does it look like when there is a bug?

  • DAVID MALAN: What does it look like when there is a bug?

  • So the debugger won't find your bugs and it won't show you your bugs, per se.

  • It's going to let you see what line is executing,

  • it's going to let you see what's outputting,

  • it's going to let you take input, but all it's

  • going to do on that right-hand side is just show

  • you the values of things along the way.

  • It's up to you to infer from that information what

  • it is that's going wrong, just like if you're using printf in past weeks

  • to see what's going on in your program.

  • Other questions?

  • And let me save this too.

  • It is so easy to get into the habit, especially when so many things have

  • been new over the past few weeks of just saying, ah,

  • this is just yet another thing to learn.

  • This is hands down the kind of tool that if you

  • spend a few extra minutes this week and next week just using it,

  • get a little more comfortable with it, it

  • will save you potentially hours in the long run,

  • because all the time you've been spending manually

  • trying to fix your bugs or posting questions online

  • trying to understand things, this is a tool

  • that if you invest those minutes upfront will just

  • help you understand everything going on inside of your program,

  • and will absolutely over the next few weeks save you more and more time.

  • All right, any questions? yeah?

  • AUDIENCE: So you have a for loop that ran [INAUDIBLE] times,

  • [INAUDIBLE] separate break statements so you don't have to [INAUDIBLE]..

  • DAVID MALAN: Ah, good question.

  • If you have something like a for loop or a while loop, something

  • that's happening a lot, can you set a breakpoint in such a way

  • that it only breaks so that you don't have to walk through it 100 times

  • just to see that value?

  • Short answer, yes.

  • And let me defer to section and online resources for just a few

  • of these features, but one, you can actually watch values,

  • and you can have what's called a watch expression.

  • You can say show me this value if only when x is greater than 50

  • or something like that.

  • Or you yourself can just add some lines of code.

  • You could add a, if x equals-equals 50, then print out something,

  • and you can set a breakpoint on that new, if temporary line,

  • so there's a couple of ways to do that.

  • Good question to anticipate.

  • Yeah?

  • Behind.

  • AUDIENCE: If you run debug50, aren't you adding

  • another arugment with the [INAUDIBLE] in your main method at line 4?

  • DAVID MALAN: Really good question.

  • If you're running debug50, aren't you adding

  • another argument-- argv-- per our discussion last week of command line

  • arguments?

  • Short answer, no, because debug50 corrects for that,

  • so you don't have to worry about that.

  • It will not shift things over numerically.

  • Really good thought.

  • Other questions?

  • All right, so with that said, let's now take some training wheels off.

  • So the only reason I bought these training wheels years ago

  • is to make this very dramatic point of now taking the training wheels off

  • today.

  • OK, so what does this mean?

  • Well worth the trip to Target.

  • So what does this mean?

  • For the past few weeks, we have been using a whole bunch

  • of functions from CS50's library.

  • All of these were meant to just make it pretty easy, relatively speaking,

  • in the first few weeks to get input from the user.

  • Because it turns out, as we'll see today,

  • it's actually a kind of a pain in the neck to get input from users in C,

  • and frankly, even in other languages reliability.

  • Because you'll recall that get_string and get_int and all of these functions

  • take on the burden of like re-prompting the user if they don't actually

  • give you an an int or don't give you a float

  • or don't give you a char that you're expecting, they'll re-prompt,

  • they're using a while loop or a do-while loop or the like,

  • so there's just a lot of error detection built into these functions.

  • But, most importantly-- and most misleadingly,

  • has been the last one on this list.

  • Recall that we introduced a couple weeks ago now the notion of a string.

  • And a string is in English what?

  • An array of characters, good.

  • It's a sequence of characters, and we learned last week that a sequence can

  • be implemented in an array, which is just a chunk of memory

  • back-to-back-to-back-to-back.

  • So string, though, is not quite like any of those other data types.

  • It turns out that it's not quite like int or char or even bool or float,

  • and we can start to see that now as follows.

  • I'm going to go ahead and go into the IDE today--

  • and henceforth we're going to just start using the IDE,

  • but you're welcome to keep using the Sandbox for quick and dirty programs,

  • but for anything you want to keep around,

  • your instinct should now be to open your IDE.

  • I'm going to go ahead and create a new file,

  • and I'm going to call it compare0.c from my first example of comparing things.

  • And I'm going to go ahead and whip up a relatively short program

  • that you would hope would work right out of the box.

  • So I'm going to go ahead and include the familiar cs50.h.

  • I'm going to go include stdio.h.

  • I'm going to go ahead and do int main void.

  • I'm going to go ahead and in here--

  • let me a variable called i using get_int from the user,

  • and just prompt them for i.

  • Let me go ahead then and prompt the user for another get_int.

  • We'll call it j and get that from them.

  • And then let's just compare these things.

  • So if i equals-equals j, then go ahead and print out

  • with printf same and a new line.

  • Then go ahead and print out the opposite, which is different.

  • So the only place I think I could have screwed up, perhaps,

  • is if I did this, which is kind of reasonable if you

  • come in knowing what an equal sign is.

  • But again, in code, we typically need two equal signs

  • because that compares two values.

  • So I didn't make that mistake, I'm feeling pretty good about this.

  • Let me save it with Command-S or Control-S or via File,

  • Save; go to my prompt and run make compare0.

  • Good, everything compiled.

  • And let me go ahead and run compare0, Enter, and I'll type in 50,

  • and I'll type in 50, and they do seem to be the same.

  • Let me go ahead and do that again, let's type in 42 and 13,

  • and they are different.

  • And I should probably test a few more, maybe some negative values, maybe some

  • 0's, positive values and the like, but I'm

  • feeling pretty good about the correctness of this code.

  • All right.

  • So let's change this program a bit.

  • Let me go ahead and create another file, which

  • I can do with the little green plus or via File, New File.

  • I'm going to go ahead save this one as compare1.c.

  • And for the moment I'm going to go ahead and just paste in that code

  • from before, but I'm going to make some changes now.

  • I'm going to go ahead and rename and retype my data types as strings.

  • So give me a string called s, and will prompt the user

  • for that using get_string, then I'm going

  • to go ahead and change this 1 to string t,

  • and I'm going to go ahead and get get_string.

  • I, of course, need to now compare s and t, not i and j.

  • And s is a common variable name for a string. t just comes after s,

  • so that's pretty reasonable too, but I should of course update that as well.

  • And so I think everything's now the same logically.

  • I just changed my data types and my variable names.

  • So I've saved this.

  • Let me go ahead and run make compare1.

  • Good, everything's correct.

  • Let me go ahead and do ./compare1.

  • Let me go ahead and type in Brian and Veronica.

  • And of course, those are different.

  • Now let me go ahead and type in David, let me type in David again,

  • and those of course are different?

  • Huh.

  • Maybe it's because I just hit the Spacebar or something.

  • So let's try Erin.

  • Her name's a little shorter.

  • Hmm.

  • OK, let's try-- oh, what's her name?

  • TJ.

  • OK, even shorter, perfect.

  • TJ, can't go wrong.

  • Different.

  • I mean, what is going on?

  • Let's just say i, i.

  • Different?

  • So where's the logical bug in this program?

  • What is it that's going on?

  • Yeah, what do you think?

  • AUDIENCE: Is it comparing integer values?

  • DAVID MALAN: Is it comparing integer values?

  • Well maybe.

  • I mean, thus far when we've used equal-equals

  • we've probably used it mostly for comparing integers,

  • so maybe I'm just misusing it, sure.

  • Other thoughts?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Oh, that's a big word that we'll get to in just a little bit.

  • But correct, correct-- but for very similar reasons.

  • So something's going on logically involving comparison,

  • because I'm using equal-equal, but maybe I'm using it for the wrong data types?

  • I mean, it's clearly broken for strings.

  • So why might that actually be?

  • Well it turns out that strings don't actually exist.

  • So a string that we know is just a sequence of characters

  • or an array of characters is not an actual data type.

  • int is, float is, double is, long is, bool is, and even more

  • are actual data types.

  • String is kind of a little white lie we've

  • been telling for a few weeks that's implemented only in the CS50 library.

  • Now the word string is super common in programming.

  • Like every programmer out there will know what you mean when you say string.

  • That is not a CS50 word, but our use of it in C is CS50-specific.

  • Because in that file called cs50.h, in addition

  • to declaring functions like get_string and get_int and get_float

  • and a bunch of other things, we also have a special line that says,

  • create a data type called string.

  • But what does it actually do or what does it actually mean?

  • Well let's go ahead and consider what might be going on underneath the hood

  • here.

  • So if I go ahead and draw the program that we just

  • ran, that program compare1 gets a string s from the user,

  • then gets a string t from the user, and then compares them.

  • So we know from last week what a string is, it's just an array.

  • So when I run that first line of code and get a string from the user--

  • for instance, Brian, I'm going to go ahead and see a B-R-I-A-N,

  • which we know from last week to actually be an array of memory that might look

  • pictorially like this-- and this, too, is a bit of a white lie,

  • there's something else.

  • AUDIENCE: The null.

  • DAVID MALAN: Yeah, the null character, so to speak, and ul,

  • which we typically just write with a backslash 0, which is just all 0 bits.

  • And it turns out, you might recall from the debugger earlier, you saw this--

  • that's the even more cryptic way of expressing the null character,

  • backslash 0.

  • Just different programs display it in different ways.

  • So when I get_string and type in Brian, this is what's allocated in memory.

  • And when I type Veronica, I can see a V-E-R-O-N-I-C-A.

  • I'm going to get that right preemptively.

  • Backslash 0.

  • That, too, is a chunk of memory, which I'll draw like this.

  • 1, 2, and split these up into interval characters or bytes.

  • And recall from last time that these bytes just come from my memory,

  • and that memory just has a bunch of bytes in it, maybe millions or even

  • billions these days.

  • And so honestly, if you just have that many things,

  • any human or computer can certainly number them.

  • Like this is byte 1, 2, 3, 4.

  • So let's just assume for the sake of discussion

  • that out of context of my computer's hardware,

  • Brian just ended up at location 100, and location 101, and 102, 103, 104, 105.

  • So this is the 100th byte in my computer,

  • this is 105th byte in my computer, and Brian

  • is using that many characters in total.

  • Veronica, she ended up somewhere else.

  • Maybe she ended up farther away just because at location 900, 901, 902, 903,

  • 904, 905, 906-- a lot more memory, 907, and 908--

  • but you can see even more visually now that the length of Brian's name--

  • strlen of Brian is what?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: I hear five and I hear six.

  • The length of Brian's name--

  • Brian, how long is your name?

  • AUDIENCE: Five.

  • DAVID MALAN: OK, it is definitively five characters, that

  • is the length of Brian's name, but you have

  • to appreciate that in the computer, Brian's five-character name does indeed

  • take up six bytes.

  • So both answers are kind of correct, but the length of the string henceforth

  • is always the number of actual characters.

  • The amount of space it takes up is that plus 1 for the null character.

  • So you can actually see why Brian's name takes up six bytes in this picture

  • rather than just the actual length, which is five.

  • So when you call get_string now, and when you call

  • get_string and get another string--

  • Brian and Veronica respectively, what is actually being handed back?

  • A couple weeks ago, Erin came up and she kind of like

  • handed me back a string, a student's name from the audience.

  • On that piece of paper we thought was the student's name.

  • But it's not.

  • It turns out that when a function returns a value,

  • it can pretty much only return a 1 byte or maybe 2 or 4 bytes.

  • It can't return an arbitrary number of bytes, like six for Brian or 1, 2, 3,

  • 4, 5, 6, 7, 8, 9-- it cannot return 9 bytes for Veronica.

  • And if you even type a whole paragraph or page of text,

  • it can't return all of that text, it can only return a single value.

  • So to your instinct earlier, what might actually

  • be getting returned by get_string when the human has

  • typed in a name like Brian or Veronica?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: The memory location.

  • Indeed, an integer, or as you called it, a pointer,

  • which we'll introduce more formally in just a moment.

  • So when get_string string returns "Brian," quote-unquote,

  • it's actually not returning B-R-I-A-N backslash 0, it is just returning 100.

  • And when get_string returns Veronica, it's not returning her name,

  • it's returning 900.

  • And so if you realize that now, when you do does

  • s equal-equal t, what question more mundanely are you actually asking?

  • Yeah.

  • Memory location and memory location-- does 100 equal 900?

  • And obviously not.

  • And so that is why Brian's name, Veronica's name,

  • my name, TJ's name-- every word I typed in was of course different,

  • because each input was ending up at a different location in memory.

  • And even if I typed the same word like David twice, one David was going here,

  • one David was going somewhere else, they were ending up

  • at different memory locations.

  • Maybe 100, maybe 900, maybe something else,

  • but they were ending up in different locations in memory.

  • So equal-equals does compare values, but dammit

  • if it isn't comparing the wrong values.

  • Yeah?

  • AUDIENCE: Well what if you use some char*s?

  • DAVID MALAN: Ah, so we'll come back to that.

  • Let me come back to that in just a moment.

  • char* is actually intricately related.

  • More on that in a moment.

  • Yeah?

  • AUDIENCE: If you add two integers in memory--

  • DAVID MALAN: Uh huh?

  • AUDIENCE: Wouldn't they be in different places in memory?

  • So you would return--

  • so you need a different value.

  • DAVID MALAN: OK, really good question.

  • So wait a minute, this same logic that I'm returning the address of something

  • surely applies to integers as well or floating point values as well?

  • Because if I type in the number 50 like I

  • did earlier, that, too, is somewhere in memory-- like a box in memory,

  • and that, too, has an address somewhere in memory,

  • but it turns out, for reasons that you just alluded to, actually,

  • ints are returned as their values.

  • Chars are returned as their values.

  • Bools are returned as their values.

  • Floats are returned as their values.

  • Strings are different.

  • Strings are returned by their address.

  • And those addresses, it turns out, are ultimately going to be called

  • char*'s, which we'll see in just a moment.

  • So how do we go about then fixing this fundamentally?

  • Like even if you have no idea how to code this yet, just intuitively,

  • if I do actually want to delete--

  • if I do actually want to compare--

  • sorry.

  • OK.

  • If I do want to go ahead and compare Brian and Veronica for equality,

  • what do I want to do intuitively?

  • I can't just compare their addresses.

  • What do I need to do?

  • Isolate the characters and then do what with them?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Good.

  • Yeah, good instincts.

  • Use a for loop, use a while loop-- any kind of looping structure.

  • And intuitively, compare the first characters,

  • and if they're different, well then we know we don't have to go any further.

  • B is not a V, so surely these names are different.

  • But what about in my case?

  • If it was David and David, you would compare the first two.

  • D and D are the same.

  • Compare the second two, A and A are the same.

  • V and V, I and I, D and D, and then what am I going to hit last?

  • Null character.

  • And should I keep going beyond the null character?

  • No.

  • So this is the beauty of that super simple design for a string.

  • Insofar as strings are identified by their starting address, just the byte

  • at which they start, you still need to know

  • how long they are, because otherwise how do where one word begins and ends

  • and another word begins?

  • And so the simple decision we made last week-- as did humans decades ago--

  • to terminate all strings with backslash 0 or all 0's is a super handy trick,

  • so that if I tell you that Brian starts at 100,

  • you can infer that he ends where?

  • At byte number 105 or 104, if you will, however you want to think about it,

  • because all you need to do in linear time,

  • if you will, left or right, is check-- backslash 0, backslash 0-- ah!

  • Backslash 0, now I know how long Brian's name is.

  • So let's consider for a moment this program called string length.

  • How does strlen actually work?

  • When you pass to strlen, a variable containing a string, like Brian,

  • what is sterling probably doing?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Exactly.

  • It's looking at that null character's address

  • and subtracting the start address and the end address,

  • figuring out what the difference is, and actually returning

  • that minus 1 the total count.

  • And more mechanically, we'll see in a moment,

  • it's probably doing exactly the same thing I did,

  • which is, is this backslash 0?

  • Is this backslash 0?

  • Is this, is this, is this?

  • I asked that question five times before I saw backslash 0.

  • strlen is just a function some human wrote years ago

  • that probably just has a simple for loop and an if condition,

  • and then that's it.

  • Because that person understood before we even

  • did how strings are actually implemented.

  • Any questions then?

  • All right, so let's actually implement this.

  • Let me go ahead and into my editor here, and make one other example here

  • that I'm going to call compare2.

  • I'm going to go ahead and do include cs50.h and include stdio.h,

  • and then I'm going to do int main void, and I'm

  • going to quickly now grab my code from before where I got strings

  • and I compared them, but I have to obviously fix that comparison.

  • So here's my code from before.

  • I'm going to do this the right way.

  • I'm going to call a function called compare_strings passing in s and t.

  • Because as you proposed, we need to do some logic.

  • We don't have to pass it to a function, but we could.

  • We could just do a for loop here, but I'm

  • going to go ahead and implement compare_strings as follows.

  • If I want to write a function that returns a yes/no answer, what data type

  • should it return?

  • A bool.

  • So we've not necessarily done this yet, but you

  • can return a bool just like you can int or a char or something else.

  • I'm going to call this function compare_strings.

  • It's going to take in one string called a and another string called b,

  • but I could call those anything I want.

  • And now what's the easiest thing to check?

  • If I pass two strings, a and b, or Brian and Veronica,

  • what's the easiest question you can ask and just immediately say, nope,

  • these are different?

  • String length, right?

  • Like if the B-R-I-A-N is not of the same length as Veronica's name,

  • we don't need to do any logic whatsoever beyond that,

  • we can just quit and say false.

  • So let me just do that.

  • If the strlen of a does not equal the strlen of b, you know what?

  • Let's just go ahead and return false and get out of here.

  • OK, but now, if we get past that gateway, so to speak,

  • that check, that question, that Boolean expression,

  • now I have to compare things character by character by character.

  • So I can do this in a bunch of ways, but I like the suggestion of a for loop.

  • So for int i at 0, n for efficiency-- actually,

  • let's do i is less than the string length--

  • should I do the string length of a or b?

  • And it doesn't matter, right?

  • So let's go with a.

  • And frankly, had I been smart early on, I

  • could have stored the value in a variable and then reused it,

  • but we'll just keep going ahead for now.

  • Then i plus-plus, but I remember from last time-- this is correct,

  • but this is not good design.

  • Why?

  • Yeah, I keep calling strlen again and again, because remember, in a for loop,

  • this condition is checked again and again

  • and again-- you're just wasting your own time.

  • So let me go ahead and actually do this.

  • n or any variable equals the strlen of a, then just compare i against n,

  • because now i is getting incremented, but n is never changing.

  • So now let me go ahead and implement this for loop.

  • So if-- how about the i-th character of a does not equal the i-th character

  • of b, I can immediately conclude--

  • nope, these strings can't be the same, because some letter, like a B,

  • is not the same as another, like a V, or whatever letter we're actually

  • comparing.

  • And then I think that's it.

  • If I get through these gauntlets of questions--

  • are yours lengths different?

  • Are your characters different?

  • And I still haven't said false, what should I return by default?

  • Yeah.

  • Like if you make it through all of those questions and all is well,

  • then D-A-V-I-D must indeed equal D-A-V-I-D or whatever the user actually

  • typed in.

  • Now I'm not quite done yet.

  • When I've implemented a function or a helper function

  • like this, because it's helping me do my work,

  • what else do I have to add to the file?

  • Oh?

  • AUDIENCE: I've got a logical question.

  • DAVID MALAN: Sure.

  • AUDIENCE: In a computer, couldn't you just type in David with a capital D

  • and then david with a lowercase d, you're going to run [INAUDIBLE],,

  • they're not going to sync because your first character's not

  • the same character.

  • DAVID MALAN: Correct.

  • So this is a feature, not a bug at the moment.

  • My program at the moment is case-sensitive.

  • If I type in DAVID and all caps, that is a different string

  • I claim for now than david in all lowercase.

  • If you want to tolerate uppercase and lowercase,

  • you're going have to add more logic.

  • But for now that's a design decision that I intend.

  • All right.

  • What else do I need to add to the program?

  • Yeah, the prototype at top.

  • You can literally copy and paste-- this is the only time copy and paste is

  • probably a legitimate thing to do--

  • at the top, and then semi-colon-- don't re-implement it.

  • But I do need one other header file.

  • I'm using a function that's not in cs50.h or in stdio.h.

  • String length?

  • Where was string length?

  • Yeah, string.h.

  • So I just need this, include string.h, save.

  • Now this I think is correct.

  • We'll see if I eat the word in a moment.

  • But realize that if you're writing this code yourself,

  • like this is not a natural thing to be writing a program in office hours

  • or at home in your dorm and just getting it right the first time.

  • This is after like 20 years of doing this, so realize we happen to be--

  • and I also have a cheat sheet right here--

  • we happen to be doing this correctly often,

  • but realize that's not going to be the common case.

  • So with that reassurance in mind, let's see

  • if I have to now take all that back. make compare2.

  • OK-- phew.

  • 20 years worked out.

  • So now I'm going to go ahead and ./compare2.

  • Let's type in Brian, let's type in Veronica.

  • Those are indeed still different hopefully.

  • Now let's try myself, David and David.

  • Phew!

  • Those are the same.

  • And to your point, David in capitalized and David in all lowercase,

  • different, but that's what I expect now.

  • Any questions on compare2?

  • Yeah?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: OK.

  • AUDIENCE: [INAUDIBLE] string in the program and in general.

  • DAVID MALAN: OK.

  • AUDIENCE: Would that still work [INAUDIBLE]

  • DAVID MALAN: If you were to hard code the strings?

  • Short answer, yes, that would still work.

  • If you for whatever reason did not do this and using get_string,

  • but you did David, and here, for instance, David, that would work too.

  • And whatever your error is, if you can recreate it, just let us know.

  • AUDIENCE: It seems to be like a string that would be increased

  • for a set that was [INAUDIBLE] only?

  • And it was having issues in the little [INAUDIBLE]..

  • DAVID MALAN: I'd have to see it to be sure, but happy to chat after.

  • All right, so let's see if we can't now clean this

  • up just a little bit as follows.

  • Let me go ahead here and reveal what it is that's actually going on.

  • So indeed, there is no such thing as a string.

  • And indeed, as you pointed out a moment ago,

  • it actually goes by a different name.

  • String is just a synonym for what's called a char*.

  • Now what does that even mean?

  • So char is the same as it's always been.

  • It's a single character.

  • Star in a program written in C could of course mean multiplication,

  • we have seen that.

  • This is another use of the star.

  • Whenever you see it after a data type like char,

  • this means that the data type in question is not just a char,

  • it's the address of a char.

  • So the star just means the address of whatever the data type is to the left,

  • and this is, as you pointed out earlier, what

  • we're going to start calling a pointer.

  • A pointer is, for all intents and purposes, an address.

  • It's just a buzzword to describe an address.

  • This data type here, char*, means I want a variable that doesn't store a char,

  • it stores the address of a char.

  • The number 100, the number 900.

  • But that address is just going to be called a pointer.

  • A pointer variable is a variable that stores the address of something.

  • A char or even other data types as well.

  • So with that in mind, let me actually quickly create compare3.c, paste this

  • in, and save it as compare3.c, and let me take off, if you will,

  • those training wheels.

  • It turns out that when you get a string with get_string,

  • it doesn't return a string, per se, because again,

  • that word doesn't exist in C, it actually returns a char*.

  • And when I call it again here and return another string, it, too,

  • returns a char*.

  • Now technically the star can have spaces around it.

  • Some people write it like this, but the sort of right way to do it

  • or the default way should just be to put the star next to the variable name

  • for clarity.

  • So I have to make a few other changes.

  • This should change too, because there is no more string as of today.

  • I'm going to change this to a char*; and then I also need to change it here,

  • char*; and then here, char*; and that is actually it.

  • And honestly, the only reason we didn't introduce this like two weeks ago

  • is because it just looks cryptic.

  • Like no one wants to program the first time they're ever touching a keyboard

  • and writing code and see char* and need to worry about what that means,

  • it's just a string conceptually.

  • But the only change I technically need to make to take those training wheels

  • off is just change all mentions of string as data types to char*.

  • And that just means that you know what-- a?

  • Yes it's a string, but more technically it's the address of a string.

  • Or more precisely, it is the address of the first byte of the string,

  • like 100 for Brian or 900 for Veronica, and I'm not even

  • going to tell you where the string ends because you, the programmer,

  • can figure that out by calling strlen or just by using a loop

  • and figuring out where that backslash 0 actually is.

  • So that is enough information to pass it around.

  • So if go ahead now and compile this, make compare3,

  • and then I go ahead and do ./compare3, let's go ahead and type in Brian

  • and Veronica, those are indeed still different.

  • Now let me go ahead and type in David and David, those are in fact the same.

  • So the training wheels are off, there is no such thing as string,

  • henceforth it's a char*.

  • Let's go ahead and take a quick break here for five minutes,

  • and we'll come back and dive in more.

  • All right.

  • So we are back, and let's go ahead and simplify this now,

  • as our tendency has been.

  • It's kind of a bunch of code, but I think

  • we can make this a little tighter.

  • But rather than type this one out manually,

  • let me go ahead and just open one of our pre-made examples

  • from today, which is all in the course's website, called compare4.

  • And you'll see in compare4, that's it.

  • I only have a main function this time.

  • I've gotten rid of my compare_strings function because you know what?

  • I seem to be using something instead.

  • What function did I apparently deploy?

  • Yeah, S-T-R-C-M-P, or someone with pronounce it,

  • just str compare or strcmp.

  • So this, like strlen, also succinctly named,

  • is just a function that's actually declared

  • in one of our familiar libraries up top, string.h,

  • and it turns out if you look in the man page, so to speak,

  • by typing man strcmp, or if you go to CS50 reference and actually

  • look at the less comfortable description of the function there,

  • this is just a function whose sole purpose in life

  • is to compare strings for you.

  • But it's a little different in behavior because it's

  • a little fancier than the one I just wrote.

  • Let me zoom in on this, and you'll see that line 14 here, I'm

  • not quite treating it in the same way.

  • My logic is ever so slightly different.

  • What am I actually checking for in my Boolean expression this time?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Yeah, which is a little weird.

  • I'm checking explicitly-- if strcmp's return value equal-equal to 0.

  • Before I just said, if compare_strings s comma

  • t, because I was expecting back a bool-- true or false. strcmp, kind of weird,

  • acts the opposite way.

  • It turns out that strcmp doesn't return true and false.

  • If you read its documentation, it returns 0 if the strings are equal,

  • but super conveniently, it returns a positive value

  • if s is supposed to come before t, and it returns a negative value

  • if s is supposed to come after t alphabetically.

  • So it turns out that you can use strcmp not just to compare for equality,

  • but inequality--

  • less than or equal--

  • less than or greater than, so to speak, alphabetically,

  • or in ASCII order, so to speak.

  • It will actually compare character by character the ASCII values,

  • and that will make sure that B comes after A,

  • and C comes after B, and so forth.

  • So you can actually use strcmp to like sort a dictionary,

  • or to sort the contacts in your iPhone or your Android phone.

  • So long story short, this is a function we can use,

  • we don't have to reinvent this wheel, and thus, we have no more code

  • even after this.

  • We just have to use it correctly, and there, the documentation

  • is your friend.

  • So if I run this program it's going to work exactly the same way,

  • but let me go ahead and point out some flaws.

  • It turns out all this time, I've been a little lazy with my error checking--

  • checking for errors.

  • There's a whole bunch of things that can go wrong in week 1 of CS50

  • that we just kind of turn a blind eye to, because it would just

  • bloat our code, make it longer and sort of less interesting and fun to write

  • and less comprehensible.

  • But today, now that we know what's actually going on,

  • we can begin to ask some additional questions

  • and make our code stronger, more robust so

  • that nothing does, in fact, go wrong.

  • Turns out, if you read the documentation for get_string in the man page

  • or in CS50 reference, turns out get_string

  • does return a string-- uh, not really.

  • It returns the address of a string.

  • Uh, not really.

  • It returns the address of the first byte of a string, technically.

  • But if something goes wrong, it returns a special character called null.

  • Not to be confused with NUL, it returns a special address called null--

  • left hand wasn't talking to right hand decades ago.

  • So null, N-U-L-L, just means the address 0, which nothing should ever live at.

  • It's just a bogus, invalid address.

  • Insofar as get_string returns the address of a string in memory,

  • like 100 for Brian or 900 for Veronica, if get_string ever

  • runs into a problem and just something goes wrong with the computer,

  • if it ever returns 0, specifically 0, a.k.a.

  • null-- N-U-L-L, then you can detect that something has gone wrong.

  • So to do that, and it's going to get a little tedious,

  • but it's nonetheless the right thing to do,

  • I need to be a little more defensive.

  • If s equals-equals null, otherwise known as 0, otherwise known as 0x0,

  • but I'll write it conventionally like this,

  • I'm going to go ahead and return 1 as my exit code.

  • If t equals-equals null, I'm going to go ahead and return 1 as my exit code,

  • or I could return 2 or 3--

  • I just need to return some value to signal to the computer

  • that something went wrong, but by default we'll

  • just return 1 whenever something goes wrong, but if all went well,

  • I'm going to go ahead and return 0.

  • So recall again from last week, and we didn't spend a huge amount of time

  • on this--

  • main itself can return values.

  • By default, ever since week 1, if you don't return anything,

  • main is automatically and secretly returning 0 for you because 0 is good.

  • The reason for 0 is because there's only one 0 in the world, obviously,

  • but there is an infinite number to the left

  • and there's an infinite number of the right, negative and positive.

  • That's great, because as you've already experienced in the past few weeks,

  • it feels like there's an infinite number of things that can go wrong when you're

  • writing even the shortest of programs.

  • So that means we have a lot of numbers we can assign to error codes,

  • so to speak.

  • Now I don't really care what the error codes are,

  • so I'm just going to adopt the human convention at the moment--

  • if anything goes wrong, returns anything other than 0.

  • And so I'm going to return 1 up here, but if nothing goes wrong, return 0.

  • The point here is that by adding these three lines here and these three

  • lines here, I'm going to avoid what's called

  • a segmentation fault or segfault. Did any of you

  • encounter this cryptic error?

  • OK.

  • So a decent number of you, and if you probably had no idea what that means,

  • but starting today you will a bit more, and in the weeks to come,

  • you'll understand even more.

  • Segmentation fault means you touched memory you should not have.

  • Or something went wrong and you did not detect it.

  • It's kind of a catch-all phrase for memory-related problems.

  • This helps ward off those kinds of errors.

  • It's not the only way, but it's one such way.

  • So starting today with problems set programs and anything

  • you write in the course, you always want to be thinking about,

  • even if you go back and add it later, could this go wrong?

  • Could this go wrong?

  • Could this go wrong?

  • And just add some additional ifs and else-ifs

  • and handle those situations so that your program doesn't just crash on you

  • or segfault or surprise someone who's actually using it.

  • All right, let's take a look at one final example,

  • because frankly this is a little tedious.

  • I'm going to go ahead and open up--

  • and this file can be found in compare5.c.

  • Let me go ahead and save this so that we have it-- compare5.c.

  • I'm going to make one final comparison example.

  • I'm going to save this as compare6.c.

  • Turns out that humans like their succinctness.

  • And null, because it is technically the 0 address,

  • you can actually be a little clever.

  • If not s and if not t is a sufficient way to express those same things.

  • Because what does the bang do?

  • The exclamation point in code if you recall?

  • It inverts something.

  • So like if this is saying, if s is not 0, a.k.a., if s not null, or rather--

  • if-- now I'm getting confused.

  • Yes.

  • If I had just said, if s, then it's a valid address

  • and I should go on with my business.

  • But if it's not s or if s is null, I want

  • to go ahead and return 1 because there's an error, and down here too.

  • So any time you're checking whether something equals null,

  • you can make it more succinct by just saying if not s; if it's null,

  • return 1.

  • If it's null, return 1.

  • It's just syntactic shorthand.

  • Phew!

  • I had to think about that one.

  • Any questions?

  • AUDIENCE: Why does [INAUDIBLE] will store some [INAUDIBLE]

  • DAVID MALAN: Correct.

  • You are storing an address, but if that address is 0.

  • Saying if it's not 0, 0 is like false, so not false means true,

  • and so it has the effect of inverting the logic.

  • That's all.

  • Anytime you use a bang or exclamation point, it changes a 0 to non-0--

  • AUDIENCE: [INAUDIBLE], but even--

  • I don't understand why [INAUDIBLE] implies that it's [INAUDIBLE]..

  • DAVID MALAN: So you can think about it this way.

  • If s-- previously we had this.

  • If s equals-equals null is like saying if s literally equals 0.

  • And you can kind of think of that informally as

  • if s doesn't have a valid pointer--

  • 0 is not a valid point or it's not a valid address by definition.

  • 100 is valid, 900 is valid, 0 is not valid just by a human convention.

  • So this is like saying, if s does not have a value, that's valid.

  • So the way to succinctly say that, if not s,

  • and it's just shorthand for that is another way to think about it.

  • All right, so let's take a look at a very different program,

  • but that reveals the same kind of issue as follows.

  • I'm going to go ahead and open up an example called

  • copy0, whose purpose in life hopefully is to copy a string.

  • So notice that in my program here, which I

  • wrote in advance, I'm getting a string from the user on line 11,

  • and I'm storing it in a string called s.

  • I could change this to char* now, but we know what it is.

  • And I'm going to go ahead and copy the string's address from s into t.

  • And then I'm going to say, if the length of t is greater than 0,

  • then go ahead and just capitalize the first character.

  • So it's a little cryptic, but you might have

  • done something kind of like this with Caesar and with recent string

  • manipulation.

  • This is just making sure, do I have at least one character?

  • And if so, first character is t bracket 0, as you recall.

  • toupper is a function in ctype.h from last week

  • that just capitalizes this letter.

  • So this one line of code, 19, just capitalizes the first letter

  • in t, that's it.

  • And then at the very end we just print out what s is and print out what t is.

  • That's all.

  • So this program just copies s into t, capitalizes t, and that's it.

  • So let me go ahead and make copy0.

  • This is in our code from today.

  • So I'm going to do cd sc3, because I already wrote it in that directory.

  • make copy0.

  • Went well. ./copy0.

  • Let's go ahead and type in tj again in lowercase.

  • Enter.

  • Huh.

  • TJ, TJ-- both are capitalized.

  • All right, maybe it's just a weird thing with initials.

  • So let's just do Veronica, all lowercase.

  • Huh, that's definitely capital.

  • Let's do even more obvious difference, Brian where

  • the B's really going to look different.

  • Yet I'm only capitalizing t.

  • Well let's consider what's actually going on here.

  • In this case, when I'm getting a string from the user, s and t, and I type in,

  • for instance, brian in all lowercase, backslash 0, this, of course,

  • is just an array underneath the hood.

  • This is taking up six bytes here.

  • And when I store in s, s is a string.

  • So you know what?

  • We didn't do this before.

  • Let me actually create a variable, a chunk of memory for s and call it s.

  • And suppose Brian is just where he was before--

  • 100, 101, 102, 103, 104, and 105.

  • So if I do s equals get_string and get_string returns Brian,

  • what do I write in the box called s?

  • Yeah, just 100, right?

  • This is all that's been going on all this time

  • even though we didn't talk about it at this level.

  • And actually, it turns out-- pointer actually can be used pictorially.

  • If you actually prefer to think about a pointer as being an address

  • or like kind of a map that leads you somewhere, another way a human

  • would typically draw a pointer-- because honestly,

  • who really cares that Brian is at address 100?

  • Like that is way too low level, that's week 0 stuff.

  • He's just pointing there.

  • So s is a pointer to that chunk of memory.

  • It happens to be 100, whatever, the arrow is how you would literally

  • point at the chunk of memory if you were drawing this on some notes.

  • So that, too, is correct.

  • So the problem arises here with that line of code.

  • When I actually try to copy s and store in t, think about what's going on.

  • The right-hand side is just s's value, which happens to be 100.

  • The left-hand side is just saying, hey computer, give me

  • another variable, first string, and call it t.

  • So that's like saying, hey, computer, give me another chunk of memory,

  • call it t, and then store s in it.

  • But what does it mean to store s?

  • Well what is s's value at this point in time?

  • It's the pointer to Brian, or it's technically--

  • I'll write both just for thoroughness-- it's literally the number 100.

  • So if you do t equals s, that is like saying put 100 there too,

  • and pictorially that's like saying this.

  • So at this point in the story, when I copy s into t,

  • the computer took me literally.

  • It did copy s into t, but what is s?

  • It's just the address.

  • It is not B-R-I-A-N backslash 0, it's just the address.

  • So when I then say, t bracket 0 gets toupper--

  • so let's look at this line of code.

  • The one line of code here that's highlighted,

  • when I say go to the 0th character of t and store

  • the uppercase version of that same character, you just follow the arrows.

  • If you ever played chutes and ladders as a kid,

  • you just kind of follow the arrow, see where you end up.

  • t bracket 0 is this location here, because again,

  • if this is a chunk of memory, per last week it's an array,

  • so you can also think of this as being bracket 0, this is bracket 1,

  • this is bracket 2, and so forth.

  • So it's just an array.

  • So t bracket 0 is lowercase b, and toupper of lowercase b,

  • of course, changes this little b to a B. But now

  • both s and t are still pointing at the same chunk of memory,

  • so of course s and t are both going to be Bryan capitalized,

  • or TJ too in my first example.

  • Any questions then on what we just did and why that happens?

  • All right, so intuitively what's the fix?

  • Doesn't matter if you've no idea how to code it,

  • like what do we have to do to fundamentally copy a string, not

  • an address?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Create a new what?

  • AUDIENCE: Basically create the [INAUDIBLE]..

  • DAVID MALAN: Yeah.

  • Create the same string in a new chunk of memory.

  • What I really need to do is allocate or give myself

  • a bunch of more memory that's just as big as Brian,

  • including his backslash 0.

  • And then logically I just need to copy every character into that.

  • So if I go back to my original when it was a lowercase b,

  • I need to make a copy logically by using a for loop or a while loop

  • or whatever you prefer--

  • B-R-I-A-N backslash 0, so that when I copy the string and then store it in t,

  • It's not actually copying literally s.

  • And let's suppose that he ends up at location 300 just arbitrarily--

  • just making up easy numbers.

  • t now stores 300, points here.

  • So when I execute this line in this version of the story, t bracket 0

  • gets toupper, what am I actually doing?

  • I'm following a different arrow this time

  • because I gave myself a different chunk of memory, capitalizing this Brian,

  • thereby hopefully fixing the bug, albeit verbally only.

  • So how do we do this in code?

  • We need to do exactly that.

  • We need to give ourself some more memory,

  • so let's introduce one other feature of C. In copy1.c,

  • we see the solution to this problem.

  • Notice at the top I'm doing things a little lower level-- oop, surprise.

  • Notice in this version of the code, copy1.c,

  • see I've started off almost the same, but just to be super clear,

  • I'm just using char*.

  • I don't want any magic, so there's no string,

  • there's no training wheels here.

  • But this logically is the exact same as before--

  • plus the error-checking.

  • This line is new.

  • And it looks a little funky, but let's see what's going on.

  • And this line of code here, what am I doing?

  • The left-hand side, that's shorter, let's start with the easier one.

  • Char* t, just in layman's terms, what does that expression do? char*?

  • Hey computer, do what?

  • What's that?

  • AUDIENCE: [INAUDIBLE]

  • DAVID MALAN: Not quite yet.

  • Different formulation.

  • Hey computer, give me--

  • not quite.

  • Be more precise?

  • AUDIENCE: An array?

  • DAVID MALAN: Not quite an array, just this part.

  • So let me hide all this.

  • If the star wasn't there--

  • I can't really do this very well.

  • So this-- yeah?

  • AUDIENCE: [INAUDIBLE] character?

  • DAVID MALAN: Good, I'll take that.

  • So hey computer, give me a pointer to a character.

  • Or even more low level, hey computer, give me

  • a chunk of memory in which I can store the address of a character.

  • I mean, it is that mundane.

  • Draw a box on the screen, call it s-- or rather,

  • call it t, but just give me space for a pointer, as you said.

  • So that's all that's doing.

  • It's drawing a box on the screen and calling it t, and it's currently empty.

  • Now let's look at the scarier part on the right-hand side.

  • malloc, new function today.

  • Stands for memory allocates.

  • It's very cryptic-sounding, but it just means give me a chunk of memory.

  • It says exactly what you said in functional terms.

  • Then it just needs you to answer one question--

  • OK, how much memory do you want?

  • How many bytes do you want?

  • And now maybe the math, even though cryptic at first glance, makes sense.

  • Get the string length of s, add 1, and then multiply it

  • by the size of a character.

  • And we've not seen this before. sizeof literally does that.

  • It tells you how many bytes is a char.

  • Happens to be 1, and in fact, that's defined.

  • So if we simplify this in C, the char is always 1 byte,

  • so this is equivalent to just multiplying by 1.

  • And obviously mathematically that's a waste of time,

  • so we can whittle this down to be even simpler.

  • I was just being thorough.

  • So now, hey computer, allocate me this many bytes of memory.

  • Why is it plus 1?

  • AUDIENCE: You need the null character.

  • DAVID MALAN: I need that null character.

  • Brian is 1, 2, 3, 4, 5 as he said, but I need the sixth for his null character,

  • and I just know that's going to be there.

  • So at this point in the story, what has happened?

  • All that malloc does is it gives me this box of memory

  • containing room for as many bytes are in Brian's name.

  • But it doesn't fill them just yet.

  • Now I need to logically fill those bytes with Brian's actual name.

  • So if we scroll down to my for loop here,

  • we can actually copy the string into that space.

  • And it's a little long, the expression, but nothing new here.

  • Initialize i to 0, n to the length of s, i is less than or equal to n--

  • we'll come back to that, i++.

  • So it's just a pretty standard for loop.

  • Then copy the i-th character of s into the i-th character of t.

  • The only thing that's making me a little nervous honestly is this thing here.

  • Like I feel like every time we do less than or equal to,

  • we create a bug like last week.

  • But this is correct, why?

  • Why do I want to go up to and through the length of this?

  • AUDIENCE: Is it the null character that adds--

  • DAVID MALAN: Exactly.

  • Because of the null character.

  • I actually don't want to stop at the strlen of s, so I could change this.

  • If you're just more comfortable using less than, because you just

  • got your mind wrapped around why we do that in the first place, that's fine,

  • we just need to do this instead.

  • So this is mathematically-- if you go to strlen plus 1, the same thing

  • as not doing that math but just going one step further.

  • Just whatever you want to think about it is fine.

  • However you want to think about it is fine.

  • OK, and then lastly, just a quick check, is the length

  • of t at least one or more characters?

  • Because otherwise there's nothing to capitalize, and if so,

  • go ahead and do it.

  • So if I now run this example, make-- oop, let me save it.

  • make copy1, that compiled.

  • ./copy1, now let's type in tj, tj in lowercase comes back,

  • but now t is capitalized.

  • And let's go ahead and do Brian's name in all lowercase, only one of them

  • is now capitalized.

  • So does that make sense what's now happened?

  • All right.

  • So where can we go with this?

  • Well it turns out-- let me open up one final example here,

  • because honestly, that's incredibly tedious,

  • and no one's ever going to want to copy strings if you

  • have to go through all of that work.

  • Turns out that store copy exists.

  • So when in doubt, check the man page.

  • When in doubt, check CS50 reference.

  • Does the function exist somewhere related

  • to some keywords you have in mind?

  • Like string copy, see if something comes back.

  • And indeed, we've had strlen, we've had strcmp, we now have strcpy,

  • and if you read the documentation, this is deliberately reversed like this.

  • The destination is this variable, the source or the origin string

  • is this one, and it copies from one end to the other,

  • and then I don't need that for loop.

  • It just saves me a few lines of code.

  • All right.

  • So let's take off one other detail here.

  • Oh, and you'll notice, actually, let me make one fix, one fix here.

  • It turns out that what I'm doing here is a little lazy.

  • It turns out that malloc does have an opposite.

  • So anytime you allocate memory, technically

  • you should also be freeing that memory.

  • And so C allows you to ask the computer for as much memory as you want,

  • but if you never give it back, have you ever experienced on your own Mac or PC,

  • like after your computer's been running a while

  • or using some new or bloated program like a browser,

  • it gets slower and slower and slower?

  • And in the worse case it just freezes or hangs or something?

  • It is quite possible that that program simply-- was made by humans,

  • of course--

  • just has a memory leak.

  • So some human wrote one or more lines of code that uses malloc

  • or some equivalent in another language that just kept allocating memory

  • for the user's input.

  • You're visiting one web page, two web pages,

  • that requires memory whatever the program is.

  • And if that human never calls the opposite of allocate-- deallocate,

  • otherwise known as free, you're never giving the memory back

  • to the operating system.

  • So it gets slower and slower because it's running lower and lower and lower

  • on memory, and it might have to move some things around

  • to make room for things, that's what's called a memory leak.

  • And so indeed, in this program, I should actually improve this a little bit.

  • If I go back into this version here and line 18, recall,

  • I allocated this memory just to make my copy,

  • the very last thing I should actually do in this program

  • is this line here-- free.

  • You don't have to tell the computer how many bytes you want to free,

  • it will remember for you so long as you're just pass in the pointer--

  • the variable that's storing the address of the chunk of memory

  • that you allocated.

  • All right.

  • So let's now see why we've been using get_string,

  • since it's not just to kind of simplify the code,

  • it's also to defend against some very easy problems.

  • Here is a program called scanf0--

  • scanned formatted text, another arcane-sounding function,

  • but it's pretty straightforward.

  • This program simply gets in from the user using scanf.

  • Up until now for the past three weeks, you've used get_int.

  • So this is an alternative to get_int that you could

  • have started using a few weeks ago.

  • Give me an int called x, print out x colon whatever--

  • that's just the prompt to the user.

  • scanf %i, &x;, whatever that is, and then print out x's value using %i.

  • So what's going on here?

  • Now today we can actually start to wrap our minds around what get_int actually

  • does.

  • This is effectively get_int.

  • If you actually look at the source code for get_int, it's a little fancier.

  • But in essence, what get_int does is it declares a variable called x,

  • and it doesn't put anything there, because that's

  • supposed to come from you, the human.

  • It then prompts you for whatever string you pass to get_int,

  • so those are the first two lines.

  • And this is the only weird-looking one.

  • Scanf is like the opposite of printf.

  • You still use a formatted string-- %s, %i, %f or whatever,

  • but you're not going to output this, you're going to input this from

  • the human's keyboard.

  • And %x is the opposite of--

  • is the special symbol in C that says, go ahead and get me the address of x.

  • So don't pass in x, give me the address of x.

  • Now why is that?

  • We'll see, but this is the way where you can tell the computer,

  • I've made a variable for you called x, here is where it is.

  • It's a treasure map that leads you to x, go put a value here for me.

  • And so the end result is that we do, in fact, end up getting an int.

  • If I do make scanf0, and then ./scanf0, I'll type in 42, all right?

  • It's not an interesting program, it just spits back out what I got,

  • but that's literally all that get_int, of course,

  • is doing if you then print out the value.

  • So if I stipulate this is correct, this is how you get an int from the user,

  • but honestly, the reason we don't do this in week 1 of the course is like,

  • my God, we just took the fun out of even getting a simple number from the user

  • by using these lines of code and whoever knows

  • what this symbol is-- we don't want you to think about that,

  • we want you to just get an int.

  • But today those training wheels are off, but we're

  • going to run into a problem super fast.

  • Let's try the same thing with a string.

  • If I were to do this, you would think that the result is the same.

  • Or let's just do it as char*.

  • But there's going to be one tweak.

  • If I go ahead and give myself space for the address of a character,

  • I don't need to use the ampersand now, because scanf

  • does need to be told where the chunk of memory is,

  • but it's already an address, so I don't need the ampersand here.

  • Recall earlier, I declared int x, which was just an int.

  • %x gets the address of that int.

  • Here, I'm saying from the get-go, get me the address of a char.

  • I don't need the ampersand cause I already have the address of a char

  • by definition of that star symbol.

  • So what's going on here?

  • Let me see now.

  • If I run scanf1, what happens?

  • So make scanf1 and--

  • oh, let's see.

  • Here's a warning I'm getting.

  • Variable s is uninitialized when used here.

  • All right, that's fine.

  • It wants me to initialize it because this is a very common mistake.

  • Those of you who alluded to segmentation faults

  • earlier might have encountered something similar in spirit to this.

  • So that squelched that error.

  • Let me go ahead and run scanf1.

  • All right, here we go, TJ.

  • Hmm.

  • That is not your name, but OK.

  • It didn't crash at least, it's just a little weird.

  • David.

  • Null, OK, that's a little weird.

  • Let's go ahead and do this again.

  • Let's type in a really long name.

  • Enter.

  • Dammit, that didn't work.

  • So let's try an even longer name.

  • I'm hitting paste a lot.

  • OK-- dammit.

  • Too many times.

  • Too many times.

  • Command not found, that's definitely not a command.

  • Wow, OK.

  • Well that's interesting.

  • Oh, there it is.

  • Null, same thing.

  • OK, so what's actually going on?

  • Well null, which is all lowercase here, which

  • is this kind of an aesthetic thing, well it's not working.

  • It's not working.

  • Well what am I actually doing?

  • In that first line of code, when I say give me s to be a char*,

  • otherwise known as a string, all that's doing is allocating this.

  • And it's technically the size of a pointer.

  • A pointer, we never mentioned this before, but now we can.

  • Turns out it is 64 bits or 8 bytes.

  • 8 bits is 1 bytes, so a pointer is by definition on many computers these

  • days-- most of your Macs, most of your PCs, the IDE, the Sandbox, the Lab--

  • is 64-bit.

  • So that just means there's 64 bits here, but we initialized it to null,

  • so that just means there's 64 0's here, dot-dot-dot.

  • But when I get a string using scanf, what

  • I'm telling the computer to do with this line of code here,

  • notice, is hey computer, go to that address and put a string there.

  • So what's actually happening?

  • It turns out that there's just not enough room to type in TJ.

  • There's not enough room--

  • that's a bit of a white lie, because we could fit you in 64 bits,

  • but there's not enough room to type in the long sentence or paragraph of text

  • I did, right?

  • What did we not do?

  • We didn't allocate any space over here.

  • All we allocated space for was the address.

  • And so every time I use scanf saying, get me a string and put it here,

  • there's nowhere to put it.

  • And so the value just very defensively says, no, like no,

  • cannot store this anywhere for you.

  • So I actually need to be a little smarter about this.

  • I actually need to get myself some space so that I can actually store something

  • in the right place.

  • Let's do that.

  • Let me go ahead and create a new program.

  • I'm going to go ahead and call this scanf2.

  • We need a little secret code to remind me of that.

  • Oh, wrong file name.

  • So I'm gone ahead and create a file called scanf2.

  • scanf2.c.

  • And I'm going to quickly recreate this stdio.h, int main void,

  • and then down here I'm going to go ahead and-- you know what?

  • Instead of a string s, which I know today to be a char* s,

  • what is this string really?

  • Well you said it earlier.

  • What is this string?

  • It's an array of characters.

  • Let me take you literally.

  • Just give me an array of let's say five characters.

  • The D-A-V-I-D, or one more, that's fine, just enough for my backslash 0.

  • Let me just create a string-- really low level,

  • but this time give myself the chunk of memory.

  • I don't want just the address of a character,

  • I want the actual characters themselves.

  • Let me go ahead and just prompt the human for their string with s,

  • just like before.

  • Then let me call scanf and get a string from the user using %s and then pass

  • in s.

  • And here's a little trick.

  • It turns out that because a string is really just an array,

  • but a string is also just a pointer, you can actually treat

  • an array as though it is a pointer--

  • an address.

  • And so even though this is a char* array, this is OK.

  • This is the equivalent in this context to being just the address of a string.

  • Because strings are arrays, arrays can be treated as pointers as of now.

  • And then let me go ahead and just print out whatever the human typed in.

  • S is actually this.

  • Pass in s;, save.

  • Yeah?

  • AUDIENCE: So [INAUDIBLE] char*?

  • DAVID MALAN: At this point it would be redundant to do char*,

  • because I literally want for this story six characters.

  • I want space, rather, for six characters.

  • So this is kind of week 2 stuff now, there's no pointers involved.

  • But again, just showing the equivalence of these ideas for now.

  • So if I now go into this, and this is in my other directory at the moment,

  • make scanf2, Enter, ./scanf2, s is going to type in--

  • I'll type in my name, I know I can fit that, we're back in business.

  • Like now it's working because I didn't just create the address for a string,

  • I created the space for the string.

  • But let me get a little dangerous--

  • David Malan?

  • OK, that kind of worked out OK.

  • David Malan or some really long other name?

  • OK, that worked out too.

  • Let me go ahead and run it again.

  • Let me try that really long string again, see what happens.

  • I know this didn't work very well last time.

  • All right, done.

  • Ooh, OK.

  • So now I'm in the club of those of you who have had segmentation faults.

  • So let's understand what's going on here.

  • Segmentation fault a moment ago I claimed

  • was touching a segment, a chunk of memory that's not your own.

  • So just happened?

  • Well with this simple program, I told the computer, hey computer,

  • give me room for six characters, give me six bytes.

  • With the scanf line, I'm telling the computer, put the following user

  • input at that location, in that array of characters.

  • D-A-V-I-D backslash 0 fit.

  • David Malan didn't really, but it didn't seem to be a huge deal.

  • David Malan or some really long other name, also didn't crash the computer.

  • But that's because unbeknownst to us, usually when you ask for six bytes,

  • the computer is kind of sort of-- it's giving you a few extras.

  • It's not safe to use them, but it gives you enough

  • that you're not going to necessarily see a problem like a segmentation fault.

  • But it only allocates a few extra bytes typically,

  • so if you really keep pasting in long, long, long, long lines of text,

  • eventually you're going exceed not only those six

  • bytes, but well past the special--

  • the secret bytes that you got back that you shouldn't be using anyway,

  • and that point the computer just gives up and says,

  • you are touching memory you shouldn't, a.k.a.

  • segmentation fault.

  • AUDIENCE: [INAUDIBLE] if the computer gives you

  • a few extra bytes, then why isn't it printing any of the other stuff?

  • After you said [INAUDIBLE] it just printed David.

  • DAVID MALAN: Really good question.

  • So even though I'm getting these sort of extra bytes,

  • why am I not seeing them after D-A-V-I-D?

  • I'm probably getting lucky.

  • Long story short, when you first run a program,

  • much of the memory that your program has access to is by default initialized

  • to 0's.

  • 0 is the same thing as backslash 0, and so I'm getting lucky.

  • When I had D-A-V-I-D and then excess space in that array,

  • a lot of them are initialized as 0's already,

  • and the string is getting secretly terminated for me.

  • Or the better answer is, it's undefined behavior.

  • Like you should not touch memory that is not your own.

  • What happens after that is your risk alone.

  • But that's a conjecture as to why that's happening.

  • All right, so what is the fundamental feature than get_int

  • is providing for us?

  • All of this time get_int has actually been dealing

  • with all of this headache for us.

  • I mean honestly, even I'm getting bored like thinking about, talking

  • about how you just get a damn string from the user,

  • because you need to figure out, well how many bytes do you need?

  • And what if the human types in one more bite than you were expecting?

  • Then you need to do a switcheroo and get more memory.

  • get_string is doing all of this headache for us.

  • And that's not to say you need to use it forever,

  • there are indeed training wheels, but that's

  • just because when you're using C or a lot of programming languages,

  • the computer will only do what you tell it to do.

  • And it turns out that even asking the user for input,

  • if you don't know how many characters he or she is

  • going to type in from the get-go, you have to deal with it.

  • And so underneath the hood-- and you're welcome to take a look at the source

  • code for CS50's library, which I'll post on the home page later today,

  • it turns out that with the way we're doing get_string is taking baby steps.

  • We literally like get one character at a time

  • from the user, kind of building the road as we go.

  • And if we don't have enough space, we ask the computer,

  • give me some more bytes so I can get more bytes,

  • and we just get one character at a time so

  • that we can handle the user maliciously or accidentally typing in way

  • more input than we actually expect.

  • So let's contextualize all of this then.

  • Recall that we've been drawing these pictures the past couple of weeks.

  • Let's just make this super clear as to what's been going on.

  • This is a memory module in a computer.

  • It's just a green board, it's way blown out of scale here,

  • it's easily like yea big inside of your Mac or PC laptop or desktop,

  • though can vary in size.

  • One of these black chips is the actual memory or the bytes

  • to which we've been referring.

  • And if we zoom in on that, recall that I proposed last week

  • that you can just think about this as like a grid, an array.

  • And it doesn't have to be rectangular, this is just an artist's rendition,

  • but each of those squares represents, we claimed, a byte.

  • And each of those bytes can be addressed in some way with a number.

  • And that number is just its location, otherwise known as an address.

  • We can actually see this, it turns out, as follows.

  • Let me go ahead and open up this example here.

  • Or actually, you know, let's just write this one from scratch.

  • Let me write a program called addresses.c.

  • And that's going to use our old friends, the CS50 library and stdio.h and int

  • main void.

  • And let me go ahead and just do this.

  • I'm going to go ahead and get a string--

  • you know what?

  • No more string. char* from the user, get_string, ask the user for s.

  • And we get another string, a.k.a.

  • char*, get_string, call it t from the user.

  • And then, I want to print out not the strings, which I used to do like this,

  • printing out s.

  • I want to print out the pointer that s really is, that is the address.

  • Turns out %p for pointer will print out not the string at that memory location,

  • it will print the actual memory location for you of s.

  • And I can do the same thing here, %p, backslash 0, paste in t.

  • And just so I know which is which, let me just prefix it

  • with some text-- s colon and t colon.

  • Let me go ahead now down here and do make addresses.

  • Oh, I messed up, missed a semi-colon.

  • Let me do this again.

  • make addresses.

  • And get rid of this.

  • That compiled OK, ./addresses, and here we go.

  • Let's type in-- let's do Brian and Veronica like before.

  • Enter.

  • And this is a little funky, but it turns out the IDE in your Macs

  • and your PCs have a lot of memory.

  • So this is the address.

  • It's not quite as small as 100, it's not quite as small as 900.

  • It's actually kind of big.

  • It's 2331010 with this weird 0x.

  • Well it turns out, this is just a human convention.

  • In week 0 we talked about decimal and all of us

  • grew up with decimal, 10 digits from 0 to 9.

  • Talked a little bit about binary 0's and 1's.

  • Turns out there's an infinite number of base systems--

  • decimal/dec, binary/bi are just two of those infinite number of possibilities.

  • Turns out there's another one that's super common called hexadecimal.

  • Hexa meaning 16 in this case.

  • So base-16 actually has 16 letters in its alphabet.

  • 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f.

  • So it turns out that base systems that need to count higher than 10 characters

  • just start using letters of the alphabet by convention.

  • Humans just decided this.

  • So we're getting just numbers in this case,

  • but if these addresses were even bigger, we

  • might actually see some alphabetical letters between a and f there.

  • And frankly I don't know what address this is,

  • but Google's usually pretty good at this stuff,

  • so let me actually open up another browser window.

  • And let me just paste this in.

  • Come on, Google.

  • Come on.

  • So Google is your friend when it comes to this stuff,

  • or any number of calculators.

  • 0x2331010 in decimal please.

  • And Google has translated that.

  • So Brian, I-- kind of under a bit earlier.

  • He is not at address location 0, he's actually

  • in the 36 millionth byte inside of my computer

  • right now, location 36,900,880.

  • So a little higher address than 100.

  • And then Veronica, if you really want to get into the weeds here,

  • we can say "in decimal," let Google translate that for us.

  • She's at location 36,900,944.

  • Why?

  • Who cares?

  • The computer is managing all of this for us, but when get_string used malloc,

  • these are literally the numbers that were being returned saying,

  • you may use this chunk of memory.

  • And why did humans use hexadecimal?

  • Like it's just slightly more compact to say 0x2331050, then 36900944--

  • like you just save a few digits, so it's just conventional.

  • That's all, there's no magic there.

  • But, recall earlier.

  • Do you recall that when I had the debugger open earlier,

  • you saw next to my name variable a value that was cryptically 0x0?

  • Then there was another value that I don't recall--

  • 0x-something?

  • That was just the numeric address of my name in hexadecimal.

  • And 0x0 is just the technical address being used by null.

  • Yeah?

  • AUDIENCE: You said the address printed out was [INAUDIBLE] x of the variable s

  • and--

  • DAVID MALAN: Sorry, could you say that again?

  • AUDIENCE: You said the address printed out on the screen was an x,

  • but x is [INAUDIBLE]

  • DAVID MALAN: Ah, I should've clarified.

  • 0x, humans years ago decided anytime you see anything

  • with 0x, that means whatever comes next is hexadecimal.

  • Just the convention.

  • It's also common too if it starts with a 0, it's an octal, which is base-8.

  • If you see a lowercase b at the end, it means binary.

  • So humans have just come up with symbology

  • as to kind of communicate this to readers, that's all.

  • Not part of the value.

  • So turns out that we can actually do this math ourselves.

  • And we won't really get into the weeds of this

  • because it's not a particularly useful life

  • skill, to be able to convert to various base systems,

  • but let's just do one example so that we've seen it.

  • Just to make clear that there's no magic here,

  • it's just a different way of thinking about numbers versus grade school.

  • So if back in the day we had three decimal numbers--

  • 255, 216, and then another 255, if we rewound to week 0,

  • we could go through the math of converting that to binary.

  • And even if it might take you a little while, this is the binary equivalent.

  • And frankly, the first and last are kind of easy.

  • 255 is kind of a special value because with 8 bits, all of which

  • are 1, that's what gives you 255.

  • So the only hard one is actually this.

  • But who cares about the math today.

  • We know from weeks ago that we can do this if we really tried.

  • But notice that bytes are eight bits, and of course, eight is a pair of four,

  • if you will.

  • Well what's really nice about hexadecimal is that it starts at 0

  • and ends at f.

  • And that's 0, 1, 2, 3, 4, 5, 6, 7, 8, 9--

  • wait-- yes, that's 10.

  • OK.

  • And then a, b, c, d, e, f.

  • I just held up 16 fingers in total, hence, hexadecimal.

  • What's nice about base-16 is that how many bits do I need to count from 0 up

  • to--

  • one, two, three, four--

  • 15?

  • Just 4, right?

  • So if I have all 0 bits, that's 0.

  • And if I have 4 1-bits, that's--

  • let's see.

  • This is an 8 plus 4 plus 2 plus 1 gives me 15.

  • So long story short, hexadecimal's super convenient because 0 through f

  • maps wonderfully cleanly to 4 bits.

  • So it's just a nice way of thinking about the world not in units of 8

  • but in 4 instead.

  • So all I did here was I took my values and I just

  • added a little bit of whitespace to make clear

  • that 8 bits is like a pair of 4 bits.

  • It turns out now that 1 1 1 1 is f for the reasons I enumerated earlier.

  • All 1's is f, otherwise known as 15.

  • All 1's is again f, otherwise known as 15.

  • If we did the math, 1 1 0 1 is d, 1 0 0 0 is 8, and then all 1's is f and f.

  • So long story short, there is a way to convert from decimal

  • to binary, to hexadecimal, to any number of other base systems.

  • It all just boils down to what digits you care about.

  • And the way you write this, to your question earlier,

  • is by human convention.

  • Not just FFDAFF, but 0xFF0xD80xFF just because.

  • Then it's clear to the user what it is.

  • So a little levity now.

  • I'm sorry to do this to you, but now you will all hopefully

  • understand this famous comic.

  • OK, welcome to that club of people who understand things like this.

  • So let's now stumble upon just one last problem,

  • and we'll take it home by putting into the context

  • a very sexy field of forensics where all of these building blocks

  • will come into play.

  • But first let's start with a problem.

  • Suppose I want to implement a function here called swap whose purpose in life

  • is just to swap two values, a and b.

  • I just want to do a switcheroo.

  • Let's first do this with a sort of mid-lecture snack for at least

  • one person.

  • Would anyone be up for--

  • OK, that was fast.

  • Volunteering, come on up.

  • What's your name?

  • Kelly, all right.

  • Thank you for volunteering so suddenly.

  • Kelly, David, nice to meet you.

  • OK, so very simple task at hand.

  • I have here two empty cups, and we have some orange juice.

  • OK, put this in here.

  • And we've got some milk over here.

  • That should stand out, very different colors.

  • OK, I would just like you, Kelly, if you could, swap those two values.

  • Orange goes into milk, milk goes into orange please.

  • That is cheating, OK?

  • No, I mean literally the cups.

  • I put them in the wrong cup, I prefer my milk

  • in the other cup and my orange juice in the other cup, I'm sorry.

  • AUDIENCE: Pour it back in.

  • DAVID MALAN: No, that is not available to you, OK?

  • [LAUGHTER]

  • OK, so you're struggling.

  • Why are you struggling?

  • KELLY: Because I'm going to mix them.

  • And then it won't be the same.

  • DAVID MALAN: Right.

  • So I mean obviously, this is kind of a losing proposition.

  • You can't really do this.

  • What would make this easier for you besides putting them back

  • in the bottles?

  • KELLY: Having another container.

  • DAVID MALAN: Yeah.

  • So you need like a temporary storage space for this.

  • You know, let me--

  • Tara, can we get some more cups over here?

  • Ah, this will make it easier.

  • OK, so if I get you some temporary space--

  • here you go-- could you solve the problem now please?

  • Ah, very nice.

  • A little contamination, but that's OK.

  • But I need that temporary cup back for Tara.

  • Yeah, OK.

  • Thank you.

  • All right, a round of applause if we could for Kelly here.

  • [APPLAUSE]

  • Well here we go.

  • I'm guessing you don't want warm milk, but orange juice?

  • OK.

  • Thank you so much.

  • All right, so what's the point here?

  • This is pretty easy.

  • Like once you have some temporary storage

  • space-- a variable, if you will, like it's no problem to swap two values.

  • So let me go ahead and do that as follows.

  • I'm going to go ahead and just implement this swap function

  • and see exactly as Kelly ultimately just implemented it.

  • If the goal is to swap a and b, I can't just do a complete switcheroo,

  • it seems.

  • I need to put one of those values, like the milk, in another container,

  • and then swap and then swap.

  • So it takes three steps, not just one.

  • All right, so I could call this extra variable or cup

  • that Tara gave us anything we want-- tmp.

  • So I'm just going to put a in tmp.

  • Then I'm going to put b in a, because a is now empty.

  • Then I'm going to put tmp in b, and then I don't really

  • care what happens to tmp-- indeed, it's just still sitting there,

  • but the job is now done.

  • So let's go ahead and see this program in action, because obviously this

  • should be pretty straightforward.

  • So let me go ahead and open up this program

  • in the context of a main function so we can actually run it.

  • In this code here, I'm going to demonstrate it as follows.

  • Here's my main function.

  • I'm going to call variable x, give it 1, call variable y,

  • give it 2, go ahead and just print out just for a quick sanity check--

  • x is this, y is that.

  • Then I'm going to call this super simple swap function, x, y.

  • Then I'm going to print the exact same thing-- x is this, y is that,

  • just so I can see in those variables--

  • I could also use debug50, but this is meant to be a complete solution,

  • I want to see it on the screen.

  • Here is swap.

  • I copy-pasted that from before.

  • This feels like a no-brainer, super straightforward,

  • let's go into my directory and compile this program, which, slight spoiler,

  • noswap is the name.

  • ./noswap.

  • Oof.

  • Let's zoom in.

  • Nope, that is not what I intended, right?

  • I really intended milk to become OJ, OJ to become milk,

  • or x become y, y become x, this doesn't seem to work.

  • And again, the only magic is this one call to swap.

  • All right, maybe it just works some of the time.

  • So nope, nope-- OK.

  • Now it's time for the debugger.

  • I don't understand what's going on in my program,

  • printf is not really illuminating here.

  • So let me go ahead and run debug50 ./noswap.

  • The little debugging panels get opened on the side,

  • but wait, I need a breakpoint.

  • I'm going to start a breakpoint at the very top, the first line I care about.

  • I don't really care about all the stuff at the super top.

  • Now I'm going to go ahead and rerun debug50 ./noswap, all right?

  • Now I see over here, the first line 9 is highlighted.

  • Notice on the right-hand side, and this perhaps

  • answers by example your question earlier.

  • x and y conveniently, but just because we're initialized to 0--

  • not by me, I shouldn't necessarily trust this in all contexts,

  • but that's why they had values.

  • They're otherwise known as garbage values, but I got lucky with 0's here.

  • Let me go ahead and step over that line, and if you watch, albeit small,

  • on the right-hand side, x should suddenly take on a value of 1.

  • And if I step over one more line, y should take on a value of 2.

  • OK, so I'm pretty confident the program is thus far correct.

  • I'm going to go ahead and step over printf.

  • And notice the blue terminal window, I see one output.

  • Now things get interesting.

  • If I continue stepping over lines, it's just going to finish running

  • and that's not enough.

  • So notice this time I'm going to hover over this third icon, Step Into.

  • Now I can kind of go down the rabbit hole,

  • so to speak, and go into the swap function, and notice,

  • the debugger jumps into that other function.

  • So here now, the context changed.

  • My local variables are now a, b, and tmp, and this is really weird.

  • A is 1, b is 2, as expected, because I passed an x, y.

  • And in the context of this function I'm just calling them a, b because.

  • But why is tmp 32,767?

  • It's just because it can't be trusted, it's a garbage value.

  • If you just give yourself a temporary value, who knows what's in there?

  • We got lucky and Tara did not have anything in this cup,

  • but it could have had a garbage value, maybe it had some Pepsi,

  • and then we would have had to replace that value somehow.

  • So to be clear, when you declare variables in a program,

  • quite often they have garbage values, just bogus values--

  • the 0's and 1's that are there underneath the hood in that chip,

  • but that you didn't set yourself.

  • But that's OK, because I'm explicitly in this next line setting tmp equal to a.

  • So it doesn't matter what its original weird value was, so if I click Next,

  • tmp is now 1, a.k.a.

  • a.

  • Now notice a is going to become b if you watch the right-hand side.

  • Now I seem to have a is 2, b is 2, which is a little worrisome but not as bad,

  • because I have that separate variable tmp, so I still have the one around.

  • So now b is about to become 1, and I've done the switcheroo.

  • OK, at this point in the story, line 22, my code seems correct.

  • b has become a, a has become b, and the values are swapped--

  • and the debugger is confirming that for me visually.

  • So now, let's do a step and--

  • dammit.

  • Lost.

  • What is going on?

  • Intuitively?

  • Even if you've never seen or done this before, like clearly there's a bug.

  • What is that bug?

  • What must be happening?

  • Yeah?

  • AUDIENCE: [INAUDIBLE] a new value [INAUDIBLE]

  • doesn't have the same address for the first one?

  • DAVID MALAN: Yeah.

  • What seems to be happening here is yes, you're passing in x and y

  • and calling it a and b, but a and b would seem to be copies of x and y.

  • And I am very successfully, very correctly swapping a and b,

  • but because they're copies, it has no effect on the original x and y.

  • So our metaphor here of juice isn't quite apt

  • because I didn't pass Kelly copies of the OJ and milk,

  • I handed her the actual OJ and milk and she was able to change the values.

  • But in the context of C and code, when you pass arguments to a function,

  • you're passing copies of those arguments to the function.

  • So intuitively, what is the solution?

  • We clearly cannot pass from one function to another copies of the values if we

  • expect the function swap, or a.k.a.

  • Kelly, to make some useful change for us.

  • What do we have to pass to the function or to Kelly instead?

  • The addresses of those values, right?

  • I told her where the milk and OJ were.

  • I didn't give her copies of them, I told her, here's the milk,

  • here's the OJ, swap those.

  • In this version of the code, I've just said,

  • here's a copy of x, here's a copy of y, you can call them a and b-- um-mmm.

  • We need to now use the ampersand or something like that to pass in a map,

  • if you will.

  • The treasure map to those values so that swap can change the original values.

  • And the way we do this is a little weird-looking,

  • but all we're going to have to do is make a little addition here

  • that looks as follows.

  • It's got to look like this instead.

  • So this is the broken version.

  • Or broken in that it doesn't have the effect we intend even though it works.

  • This is what we need to do instead, and it's the last piece

  • of new symbology for today.

  • We've seen star in a couple of different places

  • before, now we're using it in one final context.

  • When you specify a star here and here in the arguments to a function, that

  • is just the way you tell the computer, I'm

  • expecting not an int, but the address of an int.

  • I'm expecting not an int here, but the address of an int.

  • So two pointers, two addresses of integers.

  • Down here, tmp is still just an int.

  • I don't need to over think tmp, that's just an empty cup.

  • Give me an integer called tmp from week 1.

  • But, what do I want to store in tmp?

  • Both a and b in this version are addresses.

  • Do I want to remember the address a and the address b?

  • No, I want to remember the volume of OJ, the volume of milk,

  • I want to remember 1 and 2, I don't care where in memory they are.

  • So star in this context, when there's no mention of a data type,

  • there's just a star and a variable name.

  • That variable is a pointer and it's not multiplication,

  • there's no math going on.

  • That star is the dereference operator that says, go to this address

  • and get the value there.

  • So if this address a is at location, I don't know, 100 like Brian was,

  • and this address b is at location 900 like Veronica was,

  • *a means go to the 100th byte in memory and get me that value, which is 1.

  • This means, down here, go to the address b, get me that value at address 900,

  • which is 2.

  • And go ahead and store 1 in tmp.

  • Go ahead and go to that address and put whatever's

  • at b's address-- so get that address and put it over-- get that address,

  • get the value, and put it over at that address by dereferencing.

  • And then lastly, go to b in memory, like over there, put the tmp value there.

  • So whereas ampersand in our previous example means,

  • tell me what the address is of a variable, star is the opposite.

  • When you have an address, it says, go to that address.

  • Follow the treasure map, X marks the spot at that location in memory,

  • and get at its value.

  • So what is the net effect here?

  • If I actually now open up not this example, but swap.c--

  • spoiler, this one is going to actually work.

  • If I open up swap.c, we're going to see now the following instead.

  • The code is almost the same, except that I pasted it

  • in this new green version of the function.

  • And notice here, this had a change.

  • Why am I typing in %x now and %y instead of just x and y?

  • AUDIENCE: [INAUDIBLE] address [INAUDIBLE] functions [INAUDIBLE]..

  • DAVID MALAN: Exactly.

  • The swap function now, the new improved version

  • is expected two addresses-- stars.

  • Each star, a.k.a. pointers, not just values.

  • So this means I know x and y are actually integers from week 1.

  • Now I need the address of x and the address of y

  • so that swap can follow those treasure maps,

  • so to speak, and go to those addresses.

  • So now, when I run this program, this is more like the metaphor with Kelly

  • where I told her where the milk and OJ were.

  • Now swap and go to those locations as follows. make swap.

  • Let me go ahead and then do ./swap, Enter--

  • ah!

  • Now it seems to be working.

  • And we can see as much even with the debugger.

  • Even though it doesn't seem to be buggy, I can still use debug50

  • to see and understand my program, if not obvious-- oh,

  • I still need a breakpoint.

  • Let's set a breakpoint as before.

  • Let's rerun debug50.

  • The right-hand panel will open automatically for me.

  • And let's go ahead and see, if I start stepping over this,

  • now I see that x is 1, y is 2, printf prints as much on the screen.

  • Now I'm going to go ahead and step into swap,

  • and now notice, it's a little weird-looking,

  • because now a is an address and b is an address,

  • but tmp is still an int with a garbage value, but I can fix that.

  • Now tmp is 1, but notice, a and b's values are not changing,

  • but what is clearly changing per the code?

  • So notice, this is weird and cryptic.

  • a is this 0x value.

  • That's a big hexadecimal address, like that is where in memory a is.

  • But you know what?

  • If I click the little triangle, I can kind of follow that pointer

  • and go to it.

  • The debugger is smart like that.

  • So *a, go to a is 2; and *b at the moment is 2, but if I keep going,

  • now I've done a switcheroo, and you can see that these values have changed.

  • And again, we don't care what these addresses are,

  • I don't care what the actual addresses are.

  • I do care that it gives me this functionality, because now when

  • I return up here in print, now the values have indeed

  • changed as I expected this whole time.

  • All right.

  • That was complex, but hopefully clear as to why it now works even though we've

  • made this code look more cryptic.

  • If not, any questions are welcome.

  • Yeah?

  • AUDIENCE: Is that from the spot where [INAUDIBLE]

  • DAVID MALAN: Uh huh.

  • AUDIENCE: [INAUDIBLE] the star [INAUDIBLE] pointers?

  • DAVID MALAN: Good question.

  • Do we really need to have these ampersands here because we already

  • have the stars here?

  • Short answer, yes, for symmetry.

  • This is telling the function what to expect on the way in;

  • this is what's telling the computer actually what to send in.

  • So what are the actual inputs to that function?

  • It has to be symmetric.

  • Yeah?

  • AUDIENCE: [INAUDIBLE] value is swapping addresses.

  • DAVID MALAN: We are swapping what is at the addresses.

  • AUDIENCE: So what if you change the address of [INAUDIBLE]

  • DAVID MALAN: OK.

  • AUDIENCE: And would we swap the addresses saying 2 is at 200 and 1

  • is at [INAUDIBLE] that could change.

  • DAVID MALAN: Short answer, you cannot for the following reason.

  • So technically, when you do %x and %y, these are converted to the address

  • of x, the address of y.

  • Technically swap is getting copies of something, C has not changed.

  • But C is now getting copies of the address

  • of x, copies of the address of y, calling them a and b.

  • So sure, you could swap the addresses, but for the same reasons as before,

  • it's going to have no fundamental effect.

  • The difference here is because I'm passing in a map, so to speak,

  • to x and y, their addresses.

  • And again, an address is like--

  • we are at 45 Quincy Street I think right now--

  • Cambridge, Massachusetts 02138, USA.

  • That uniquely identifies the building.

  • These 0x hexadecimal numbers uniquely identify locations in memory.

  • So this is like saying now, get me the address of x, get me the address of y,

  • and I'm technically passing in copies of those addresses, but it doesn't matter,

  • because now with the star notation, I'm saying go to those addresses

  • and swap who is physically in this building and some other.

  • All right.

  • So let's just put this now into the context of what else

  • your computer actually has just that you've

  • seen some nomenclature around this computer's memory.

  • So this is the chip with a grid laid out on top of it

  • just to communicate that there's bytes here, and we could number them.

  • But let's think about this now more abstractly,

  • and let me just reveal that it turns out that the computer treats

  • different bytes, different squares in different ways just by convention.

  • It turns out that in your computer's memory--

  • and this is all just an artist's representation--

  • at the top of that chip of memory, so to speak,

  • is the so-called text of your program.

  • This is a fancy and non-obvious way of saying

  • the 0's and 1's that your code have has been compiled into.

  • The text of a program is the code you wrote in binary,

  • that's where it's loaded from memory.

  • So in macOS and Windows, you double-click an icon,

  • that program is loaded into memory I said last week.

  • It's literally loaded into the top of your computer's memory conceptually.

  • What else?

  • Well the heap is the fancy name given to the chunk of memory in which memory

  • is coming from when you call malloc.

  • So when I called malloc earlier to get a bunch of space for some characters,

  • it was just coming from this big open area called the heap.

  • And that's what get_string is using and other functions as well.

  • Well it turns out that the reason for the problem we just ran into

  • is because the bottom part of memory is what's called the stack.

  • The stack is the area of memory that functions use when they are called.

  • And this is actually relevant to that very simple noswap example as follows.

  • If we now assume that anytime you call a function, the memory it uses

  • comes from the bottom of that big block of memory,

  • where you can draw that, for instance, here on the screen,

  • because it turns out that anytime you call a function, that function gets

  • a slice of its own memory.

  • So for instance, main is always the first program a function calls,

  • and so it gets the first slice of memory at the bottom of the screen here.

  • And so if main had two variables x and y, that's like saying,

  • OK, give me a chunk of memory called x and put the value 1 in it;

  • give me another chunk of memory, call it y, put a value in it here.

  • But remember, from the first noswap example, the swap function was called.

  • This is a stack in the literal sense.

  • You go into a dining hall, a cafeteria, one tray for food, goes on another,

  • goes on another, goes on another so that the humans can take it

  • and put food and plates on it.

  • Well similarly in this model, when you call a function,

  • it gets its own slice of memory, but literally above, conceptually,

  • the existing frame on the stack.

  • So this is the swap function's own chunk of memory,

  • and it, too, gets some space.

  • It gets some space for a variable called a.

  • It gets some space for a variable called b.

  • And guess what goes inside those of that first example?

  • A copy of x and a copy of y.

  • And you know what?

  • It had a temp variable, so that's got to have some space here.

  • So I'll call this tmp.

  • And recall that I set tmp equal to a, so that got 1.

  • And then what happened?

  • Well then I did what--

  • what did I?

  • Let me get this right.

  • We had a gets b.

  • So what happened there?

  • So in this example here, a gets the value b, so that changed.

  • And then what happens here, b got the value of 10, so that changed.

  • So swap was working in the sense that it was swapping values,

  • but the problem is, when a function returns, this chunk of memory that it

  • was previously using gets reclaimed so that someone else can now use it,

  • another function.

  • So we did all that hard work and no swap, and we did it correctly,

  • we just did it in the wrong place.

  • So by contrast, this next example that we did, which was swap.c,

  • just treated the memory a little bit differently.

  • Main this time still had two variables called x, and this was a 1,

  • and then another one called y, and this was a 2.

  • And then one swap was called this time, it again

  • had a variable called a and a variable called

  • b, but what was stored in a and b?

  • Well now they're addresses.

  • And I don't know what it is, but let me just arbitrarily say that this

  • is location 100, this is location--

  • let's say 104.

  • But it could be anything, we just don't care at this point,

  • it would have 0x technically if the computer were showing us.

  • What's going in a here is 100, what's going in b here is 104.

  • And those are the addresses of x and y, and the code

  • we had using all of those new stars was saying,

  • go to address 100 and store whatever is at address 100 in tmp.

  • Then go to the address that's in b, or 104,

  • and store that at the location int *a, whatever is there.

  • Then it was saying, go get that 10th value, by the way,

  • and go ahead and put that here, so that now we did

  • different work in a different place.

  • So now when swap is done running, it doesn't

  • matter if its memory disappears because it has now mutated or changed

  • the other memory.

  • That it was passed in just like Kelly changed or mutated the cups

  • I actually pointed her at rather than copies thereof.

  • Now as an aside, there's other chunks of memory that are actually used.

  • If you have global variables in a program,

  • turns out that in between the text and the heap

  • memory are your global variables, if they're initialized with values

  • or they're not initialized with values, as would happen with the equal sign,

  • but we don't care too much about that for today's purposes.

  • And if you've ever heard of environment variables, which

  • we will when we get to web programming, they, too,

  • are stored elsewhere in memory.

  • But the most interesting chunks of memory

  • are stack and heap, as in this case here.

  • But unfortunately it's so easy for things to go awry--

  • I mean, some of you experienced segmentation faults already,

  • and let's consider why that might happen.

  • So here's a contrived example of code that is by design buggy,

  • but let's just talk it through in English what these lines are doing.

  • This line here, int *x, is saying, hey, computer,

  • give me a variable that will store the address of an integer.

  • So give me a pointer to an int is the more casual way of saying it.

  • Hey computer, give me another variable that's

  • going to store the address of an int and call it y.

  • So x and y, that's it.

  • This line is new-ish.

  • Hey computer, allocate enough space that will fit an int.

  • So sizeof int is the new syntax we saw earlier for just figuring out

  • how many bytes is an int.

  • Odds are this is going to come back as 4 or 32 bits in most computers.

  • So this just says, hey browser, give me 4 bytes of memory

  • and store that in this location.

  • Or rather, store that in this variable, store that this variable.

  • So maybe it's going to say, OK, here's four bytes at location 100,

  • or here's four bytes at location 900.

  • Or wherever, we don't care, we're just remembering that address in x.

  • *x says, go to that address--

  • 100 or 900, whatever it is, put the number 42 there.

  • This next line says, go to the address in y and put the unlucky number-- hint,

  • hint--

  • 13 there.

  • Well what is the address in y?

  • I haven't allocated it yet.

  • What's the address in x?

  • It's wherever malloc told me to use space.

  • That's safe, that was like 100, 900, whatever the value was,

  • but did I allocate space for y?

  • So what kind of value does it contain, so to speak?

  • A garbage value.

  • Maybe it's 0, maybe it's like 32,000-- we don't know,

  • because if you don't specify the value, it

  • is not safe to trust it or do anything with it.

  • This is going to give me probably one of those segmentation faults.

  • And indeed, if I run a program like this,

  • I'm quite likely going to see exactly that kind of problem.

  • It's perhaps better, though, to see this in a way that

  • will paint a more memorable picture, and for that, thought we'd take--

  • in our 10 minutes remaining, use a few of these minutes

  • to take a look at something our friends at Stanford

  • put together with a bit of claymation.

  • It's about three minutes long, well worth it

  • to paint a picture of exactly what goes wrong

  • when you don't use memory correctly.

  • If you could dim the lights.

  • [VIDEO PLAYBACK]

  • [MUSIC PLAYING]

  • - Hey, Binky.

  • Wake up!

  • It's time for pointer fun!

  • - What's that?

  • Learn about pointers?

  • Oh goody!

  • - Well to get started, I guess we're going to need a couple of pointers.

  • - OK.

  • This code allocates two pointers which can point to integers.

  • - OK.

  • Well I see the two pointers, but they don't seem to be pointing to anything.

  • - That's right.

  • Initially pointers don't point to anything.

  • The things they point to are called pointees,

  • and setting them up to a separate step.

  • - Oh right, right.

  • I knew that.

  • The pointees are separate.

  • So how do you allocate a pointee?

  • - OK.

  • Well this code allocates a new integer pointee,

  • and this part sets x to point to it.

  • - Hey, that looks better.

  • So make it do something.

  • - OK.

  • How do you reference the pointer x to store the number 42 into its pointee?

  • For this trick, I'll need my magic wand of dereferencing.

  • - Your magic wand of dereferencing?

  • That-- that's great.

  • - This is what the code looks like.

  • I'll just set up the number and--

  • [POP]

  • - Hey look!

  • There it goes.

  • So doing a dereference on x follows the arrow to access its pointee.

  • In this case, to store 42 in there.

  • Hey, try using it to store the number 13 through the other pointer, y.

  • - OK.

  • I'll just go over here to y and get the number 13 set up,

  • and then take the wand of dereferencing and just--

  • [BUZZING] whoa!

  • - Oh hey, that didn't work.

  • Say, Binky, I don't think dereferencing y is a good idea,

  • cause setting up the pointee is a separate step

  • and I don't think we ever did it.

  • - Mmm, good point.

  • - Yeah.

  • We allocated the pointer y, but we never set it to point to a pointee.

  • - Mmm, very observant.

  • - Hey, you're looking good there, Binky.

  • Can you fix it so that y points to the same pointee as x?

  • - Sure.

  • I'll use my magic wand of pointer assignment.

  • - Is that going to be a problem like before?

  • - No, this doesn't touch the pointees.

  • It just changes one pointer to point to the same thing as another.

  • - Oh, I see.

  • Now y points to the same place as x.

  • So wait, now y is fixed.

  • It has a pointee.

  • So you can try the wand of dereferencing again to send the 13 over.

  • - OK.

  • Here goes.

  • - Hey, look at that.

  • Now dereferencing works on y.

  • And because the pointers are sharing that one pointee, they both see the 13.

  • - Yeah, sharing, whatever.

  • So we going to switch places now?

  • - Oh look, we're out of time.

  • - But--

  • [END PLAYBACK]

  • DAVID MALAN: All right.

  • So hopefully that puts a little more visual behind some of these ideas,

  • but let's now contextualize this in a domain that's perhaps

  • more familiar in a couple of ways.

  • So one, some of you might already know, especially

  • if you've had prior programming experience, of a very popular website

  • called Stack Overflow where lots of programmers

  • post questions and hopefully answers to common technical problems.

  • If you ever wondered why it's called Stack Overflow,

  • it turns out it reduces to this picture here.

  • This was not a mistake that I drew one arrow from the heap pointing down,

  • and one arrow from the stack growing up.

  • As you malloc, malloc, malloc more and more space,

  • starts up here, so to speak, and you just get more and more space

  • that's going this direction.

  • But the more functions you call-- function after function

  • after function after a function, each of them

  • gets its own slice or frame of memory, that, too, is growing up.

  • So this feels like a pretty bad design, but honestly, it's not really avoidable

  • because if you have a finite amount of memory,

  • you can't avoid each other forever.

  • And so there's this fundamental risk of overflowing the stack,

  • or even overflowing the heap in the reverse direction.

  • So Stack Overflow is an allusion to, for instance, calling too

  • many-- many, many, many, many, many, many, many, many functions,

  • so many so that it overlaps other chunks or segments of memory,

  • thereby inducing a segmentation fault, and buffer heap overflow

  • is in the reverse direction, and these are more

  • generally known as buffer overflows, and we'll see more of these in the weeks

  • to come.

  • But now that we have the ability to discuss pointers,

  • let's introduce one final feature and then a familiar face.

  • So it turns out that you can actually come up with your own custom variables

  • kind of like we did with string, but even more sophisticated than that.

  • For instance, if I wanted to implement a program that

  • involves multiple students, I might do something like this.

  • Ask the user what is the enrollment in a class, then go ahead

  • and give myself an array of strings, a.k.a.

  • char*s today of that size, and then I could also have another array of dorms.

  • And I could have two arrays containing one for the students' names,

  • one for the students' dorms, and I can keep track of other things.

  • Another array for emails, another array for phone numbers--

  • but this gets messy quickly, because you can imagine,

  • if I need names and dorms and emails and phones,

  • that starts to become a lot of copy-paste.

  • And I just have this design where I have lots and lots of arrays

  • where each bracket location-- like bracket 0, bracket 1

  • presumably refers to the same student across all of these arrays, like mmm!

  • Messy, messy, messy design.

  • So with a wave of my hand, let me actually

  • fix that immediate problem out of the gate by introducing a new feature.

  • I can invent my own data types.

  • Let me just go ahead and declare an array

  • called students with this many students, but of data type student.

  • C comes with float, bool, char, int, not string, and definitely not student.

  • So you can make your own custom data types,

  • and you can put them in your own header files, which we've not done either.

  • But I can look, and you'll see more of this in the next problem set.

  • So not to worry if this feels quite brief,

  • it's just meant to be a teaser here.

  • And struct.h is how you declare or define your own type.

  • The keyword is literally typedef struct for structure, or data structure

  • to be more complete.

  • The name of the data structure comes at the end after some curly braces.

  • And then inside the curly braces you just specify,

  • well what do you want a student to have?

  • I want them to have a name, a dorm, maybe a phone number, maybe

  • an email address, anything I want.

  • I can just add here.

  • So that now in my actual code, I can have an array of actual students,

  • and I can just access them with this new notation like this.

  • You know that you can index into an array with bracket notation.

  • What you didn't know until now, perhaps, is that if at that location

  • is a structure, a.k.a.

  • struct, you can get at the name, the dorm, or the phone, or the email,

  • or anything else there just by using a dot-notation, which is

  • our last piece of new syntax for today.

  • Everything else is the same.

  • I can write a program that says so and so is in such and such a dorm

  • by just saying get the i-th student's name and the i-th student's dorm.

  • And I can be even fancier, and if I don't want to just print those values,

  • I can even, now, that I see no understand pointers--

  • or I've seen pointers and we'll soon understand them

  • by way of problem sets and practice, I can actually do this.

  • This is just a little sneak preview of a line of code

  • that uses a new function called fopen.

  • fopen this file open, and it takes in the name of the file to open.

  • You might know of CSV files, they're like simple spreadsheets,

  • comma separated values.

  • And quote-unquote "w" means write.

  • So this says open the file called students.csv in write mode,

  • so I can write to this file.

  • Because in this example, as you'll see in the days to come,

  • I want to write out to a file.

  • But it turns out to use files, I need to know what a pointer is,

  • and it's a little weird that it's all caps,

  • but there is a data type in C called "file," and it's a pointer.

  • So long story short, what you're going to see in the next problem set

  • as we explore the world of forensics is the ability

  • using pointers and a few new functions to open files and get back

  • the address of that file in memory so that you can go to that address,

  • change the contents of a file, and save it back out.

  • All of us take for granted these days that you can go to File, Open and File,

  • Save, but what's actually happening, pointers are involved,

  • stuff's getting loaded into memory, and the computer

  • is dereferencing or going to those addresses

  • and changing what's at those locations in memory.

  • Now why might you want to do this?

  • Well here, of course, is Zamila-- you might

  • recall from some of the problem sets and the walkthroughs.

  • Turns out we could try to enhance this picture of her by zooming in,

  • and here's about as much fidelity as it is in her eyes.

  • Like I do not see the glint of any criminal's logo

  • on his or her jacket in the glint of Zamila's eyes.

  • If you zoom in on an image, and an image, recall, from week 0

  • is just a grid of pixels or dots, that's all you get.

  • And you can maybe smooth it out a little bit or clean up the colors,

  • but you can't just "enhance," quote-unquote,

  • and see more of the glint in Zamila's eye,

  • because an image at the end of the day is just a bitmap, a map--

  • top-down, left-right-- of pixels.

  • For instance, here's a smiley face.

  • If you kind of take a look back and you can kind of see a black smiley

  • face against a white backdrop.

  • And if we just decide as humans, let's represent white dots

  • with 1's and black dots with 0's, this might be what's in the file,

  • this is what the human sees.

  • So if we have the ability to open that from a file, store it in memory,

  • and then using pointers go to those locations in memory,

  • we can even change the smiley face to an unhappy face, for instance, or color it

  • or do any number of things to it.

  • Now at quick glance, there's a lot going on in files,

  • because what a file is is a set of conventions that humans decided

  • on where humans years ago just decided in a bitmap file,

  • BMP file-- so an older but still popular file format for images, humans

  • just decided that, like, we're going to put a bunch of special values

  • at the first bytes of the file, then some more

  • special values than the actual RGB pixels in the rest of the file.

  • So this is meant to look cryptic at first glance,

  • and the next homework assignment will walk you through this,

  • but all it is is a convention of what the 0's and 1's mean

  • in these different locations.

  • And indeed, the challenge ahead is going to be to do a number of things.

  • One is to first and foremost figure out--

  • who done it?

  • A sort of murder mystery in which there's a clue hidden in an image,

  • but an image that's a little noisy and you're

  • going to have to figure out what secret messages in the image

  • by loading that image in, tweaking it, putting a sort of red filter

  • on top of it and seeing the secret message, but all digitally; two,

  • actually resizing images and taking this many pixels in this big

  • of a smiley face or something else and making it bigger,

  • or if more comfortable, making it even smaller

  • and figuring out how to make that workout;

  • and then lastly, we've been taking some photographs of all CS50 staff

  • in Cambridge and New Haven.

  • Unfortunately we accidentally corrupted or lost the memory card,

  • but we made a forensic image of it, a copy of all of the 0's and 1's with all

  • of the staff photos, and we're going to need

  • you to write code that actually recovers all of the JPEGs

  • or photographs from that digital card by opening a file,

  • reading in those 0's and 1's, understanding what they are

  • and where they are, and just writing them

  • back out to disk using functions we'll introduce you to in the problem

  • set itself.

  • But of course, all of this takes for granted that we can do this,

  • and you can only do so much.

  • And indeed, this week is as much about solving those problems

  • as it is realizing the limitations of computers,

  • and so we thought we'd end with the final few seconds of this very

  • real example from Futurama.

  • [VIDEO PLAYBACK]

  • - Magnify that death sphere.

  • Why is it still blurry?

  • - That's all the resolution we have.

  • Making it bigger doesn't make it clearer.

  • - It does on CSI Miami.

  • - Ugh.

  • [END PLAYBACK]

  • DAVID MALAN: And that's it for CS50, we'll see you next time.

  • [APPLAUSE]

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it