Placeholder Image

Subtitles section Play video

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • CHARLES LEISERSON: So today, we're

  • going to talk about assembly language and computer

  • architecture.

  • It's interesting these days, most software courses

  • don't bother to talk about these things.

  • And the reason is because as much as possible people

  • have been insulated in writing their software from performance

  • considerations.

  • But if you want to write fast code,

  • you have to know what is going on underneath so you

  • can exploit the strengths of the architecture.

  • And the interface, the best interface, that we have to that

  • is the assembly language.

  • So that's what we're going to talk about today.

  • So when you take a particular piece of code

  • like fib here, to compile it you run it through Clang,

  • as I'm sure you're familiar at this point.

  • And what it produces is a binary machine language

  • that the computer is hardware programmed

  • to interpret and execute.

  • It looks at the bits as instructions as opposed to as

  • data.

  • And it executes them.

  • And that's what we see when we execute.

  • This process is not one step.

  • It's actually there are four stages to compilation;

  • preprocessing, compiling-- sorry, for the redundancy,

  • that's sort of a bad name conflict,

  • but that's what they call it--

  • assembling and linking.

  • So I want to take us through those stages.

  • So the first thing that goes through

  • is you go through a preprocess stage.

  • And you can invoke that with Clang manually.

  • So you can say, for example, if you

  • do clang minus e, that will run the preprocessor

  • and nothing else.

  • And you can take a look at the output

  • there and look to see how all your macros got expanded

  • and such before the compilation actually goes through.

  • Then you compile it.

  • And that produces assembly code.

  • So assembly is a mnemonic structure of the machine code

  • that makes it more human readable than the machine

  • code itself would be.

  • And once again, you can produce the assembly yourself

  • with clang minus s.

  • And then finally, penultimately maybe,

  • you can assemble that assembly language code

  • to produce an object file.

  • And since we like to have separate compilations,

  • you don't have to compile everything

  • as one big monolithic hunk.

  • Then there's typically a linking stage

  • to produce the final executable.

  • And for that we are using ld for the most part.

  • We're actually using the gold linker,

  • but ld is the command that calls it.

  • So let's go through each of those steps

  • and see what's going on.

  • So first, the preprocessing is really straightforward.

  • So I'm not going to do that.

  • That's just a textual substitution.

  • The next stage is the source code to assembly code.

  • So when we do clang minus s, we get

  • this symbolic representation.

  • And it looks something like this, where we

  • have some labels on the side.

  • And we have some operations when they have some directives.

  • And then we have a lot of gibberish,

  • which won't seem like so much gibberish

  • after you've played with it a little bit.

  • But to begin with looks kind of like gibberish.

  • From there, we assemble that assembly code and that

  • produces the binary.

  • And once again, you can invoke it just by running Clang.

  • Clang will recognize that it doesn't have a C file or a C++

  • file.

  • It says, oh, goodness, I've got an assembly language file.

  • And it will produce the binary.

  • Now, the other thing that turns out to be the case

  • is because assembly in machine code,

  • they're really very similar in structure.

  • Just things like the op codes, which

  • are the things that are here in blue or purple,

  • whatever that color is, like these guys,

  • those correspond to specific bit patterns over here

  • in the machine code.

  • And these are the addresses and the registers that we're

  • operating on, the operands.

  • Those correspond to other to other bit codes over there.

  • And there's very much a--

  • it's not exactly one to one, but it's pretty close one to one

  • compared to if you had C and you look at the binary,

  • it's like way, way different.

  • So one of the things that turns out you can do is if you have

  • the machine code, and especially if the machine code that was

  • produced with so-called debug symbols--

  • that is it was compiled with dash g--

  • you can use this program called objdump,

  • which will produce a disassembly of the machine code.

  • So it will tell you, OK, here's what the mnemonic, more

  • human readable code is, the assembly code, from the binary.

  • And that's really useful, especially

  • if you're trying to do things--

  • well, let's see why do we bother looking at the assembly?

  • So why would you want to look at the assembly of your program?

  • Does anybody have some ideas?

  • Yeah.

  • AUDIENCE: [INAUDIBLE] made or not.

  • CHARLES LEISERSON: Yeah, you can see

  • whether certain optimizations are made or not.

  • Other reasons?

  • Everybody is going to say that one.

  • OK.

  • Another one is-- well, let's see, so here's some reasons.

  • The assembly reveals what the compiler did and did not do,

  • because you can see exactly what the assembly is that is going

  • to be executed as machine code.

  • The second reason, which turns out

  • to happen more often you would think,

  • is that, hey, guess what, compiler

  • is a piece of software.

  • It has bugs.

  • So your code isn't operating correctly.

  • Oh, goodness, what's going on?

  • Maybe the compiler made an error.

  • And we have certainly found that, especially when you

  • start using some of the less frequently used features

  • of a compiler.

  • You may discover, oh, it's actually not

  • that well broken in.

  • And it mentions here you may only have an effect when

  • compiling at -03, but if you compile at -00, -01,

  • everything works out just fine.

  • So then it says, gee, somewhere in the optimizations,

  • they did an optimization wrong.

  • So one of the first principles of optimization is do it right.

  • And then the second is make it fast.

  • And so sometimes the compiler doesn't that.

  • It's also the case that sometimes you cannot write code

  • that produces the assembly that you want.

  • And in that case, you can actually

  • write the assembly by hand.

  • Now, it used to be many years ago--

  • many, many years ago--

  • that a lot of software was written in assembly.

  • In fact, my first job out of college,

  • I spent about half the time programming

  • in assembly language.

  • And it's not as bad as you would think.

  • But it certainly is easier to have high-level languages

  • that's for sure.

  • You get lot more done a lot quicker.

  • And the last reason is reverse engineer.

  • You can figure out what a program does when you only

  • have access to its source, so, for example,

  • the matrix multiplication example that I gave on day 1.

  • You know, we had the overall outer structure,

  • but the inner loop, we could not match the Intel math kernel

  • library code.

  • So what do we do?

  • We didn't have the source for it.

  • We looked to see what it was doing.

  • We said, oh, is that what they're doing?

  • And then we're able to do it ourselves

  • without having to get the sauce from them.

  • So we reverse engineered what they did?

  • So all those are good reasons.