Name: CPUs Are Out of Order - Computerphile
Uploaded: 2020-03-27T16:58:36.000Z
Duration: 15 min 9 s
Description: Thousands of YouTube videos with English-Chinese subtitles! Now you can learn to understand native speakers, expand your vocabulary, and improve your pronunciation...

Past simple

Things we talked about a spectrum meltdown and they rely on some of the more advanced ways that the CPU operates

Gerund

It's probably worth diving down and actually looking at how a CPU actually executes the code be right

I mean, we've touched on this before we did a video on pipelining we did a video on caching, but also delve down and see

What happens below the surface when we actually get our CPU to execute our code?

A line of code that we might want to look at what happens. Let's take a line of code that takes a variable

Let's take a line of code. It's gonna add up A plus B plus C plus D

Times e so I've written this in this sort of see like language

So we're gonna do this calculation now as I'm sure most of us are aware

When we take that and put into our C compiler run it it gets converted into the machine code that the CPU executes

so we take that client of code, and then we'd have to

Convert that into the machine code, and then the CPU

Executes that machine code so a program like this would end up looking and I'm going to use arm assembly here

Just because I know it better than the anything else perhaps for the first instruction. We would load the value for memory of a

Into registers, let's pick our zero. We've got 14 or so of them

but some of them get used for different things that we don't really use so although the value of a

Into our zero next thing we want to do is you want to add that to the value of B

Then after make sure we'll get the operator precedence right so we can load the value of B into a register

And we might as well do D. And E. As well so load or three come on D. And

Adding these things up multiplying them to produce

The actual result we want now we're going to make sure we get the precedence right

But we could either start by adding a and B together then add on C. And then

Multiply D. And E and have them together or we could do that one first

I'm just going to start going from left to right as long as the math is right

We'll get the right result so we'll add together a and B now

I put those two values in r0 and r1 and we need to store the results somewhere

We are going to need the value of a again after this, so we'll reuse the register R

1 so this is adding together storing the result in R. 0 so we now added a and B together

We want to add on C. And so we could do the same thing add

The value in R. 0 which is now because of this instruction a plus B want to add on the value in R

Our 0 now we need to do the multiplication

And we need to do that separately before we add it on so we get the right result so we'll multiply

And we'll see we've got an arm, too cheap here, so we've got the multiply instruction there

And we need to put the results on whether it's use our 5 D. Which we put in R. 3 and E

4 and then we want to add the result of that onto the value

In our 0 and now our 0 contains the result of a plus B. Plus C plus D times E. And

So that line of code there at one line of C code would become what 1 2 3 4 5 6 7 8 9 10

different lines an assembler and I've numbered them because I'm going to

Refer to them at different times so we can say searching one instruction 5 etc to refer to the different ones now

We might expect that our CPU will just xu instruction 1 the new instruction 2 instruction 3 instruction 405 and so on in order

To generate the result and some cpus do in fact work exactly like that, but actually if you think about

What the cpus and what these descriptions are actually doing you might think well actually?

when I get this first one I've got to go an access memory and

As we talked about in the caching video many years ago, cache is perhaps a an old-fashioned English word

but it basically just means a small place where we can store things so you might use it to store your hidden treasure if you're

Your food for winter on a modern CPU probably say around 200

Nanoseconds to actually go and get the value out of your main memory and load it into the register now of course

If these are already cached in the same bit of memory, then you may find that these all execute very quickly

We don't know that this isn't the only way we could write this program because if we take this instruction here instruction 6

Where we do the add of r0 and r1 to add up a and B. Well. We've got those two values here

They're already in the registers at this point in the program

So there's nothing to stop us moving this instruction up there

and it would still have exactly the same effect so instruction 6 could be moved to me between instructions 2 & 3

And then we do the next instruction which was the same as instruction 3 here?

R to come of the values in memory that's representing the letter the variable see how exactly the same effect. We just moved that

Instruction earlier so you could rewrite this program in various different ways now

well when we think about how a CPU is designed and that you will have

various different what impress be termed execution units within there now one of them is what's generally referred to as the

ALU or the arithmetic and logic unit and that's the bit of your CPU that does

Addition it does subtraction it does sort of logical operators and or and so on

But you also have other bits inside there

And one of the bits you'll often have in a modern CPU is it part of your CPU that handles loading and storing

Values from memory sometimes interact sometimes they don't now

Assuming that they are separate parts of the CPU if we look back at our instructions here. We execute instruction 1

It uses a load store. You need to get a value for memory we execute instruction 2

It uses the load store unit to get a value for memory instruction 3

It uses a load store unit to get a value for memory for uses the load store unit to get a value for memory

5 uses the load store unit to get a value for a memory 6

before insertion turn uses the load store unit so we've got a pretty sequential series the first 5

instructions all execute using the load store part of the CPU the next four instructions execute using the ALU and

The final instruction again uses the load store unit but as we said we can reorder that

into this version here using instructions w x y and z

Differentiate them and we execute the first instruction instruction w uses a load store unit instruction X

Uses a load store unit instruction Y uses the ALU restrictions ed uses the load store unit

Okay, what difference does that make well let's think about what's happening when we're using the load store unit

the ALU isn't being used that part of the CV is just sitting there not being used and

When we're using the ALU the load store units sitting there not being used, that's what we saw there

But does that have to be the case could we actually design it, and you probably guess the answer is that yes?

While the load store unit say is being used that we can run the instructions on the ALU part as well

I'd turn the paper round and I'm going to draw

This as a sort of timeline so these are our two units and we've got time running along this side as well

I'm using the computer for our paper in a

Subtitles ListPlay Video

CPUs Are Out of Order - Computerphile

sort

multiple

term

spectrum