A2 Basic 2 Folder Collection
After playing the video, you can click or select the word to look it up in the dictionary.
Loading...
Report Subtitle Errors
Things we talked about a spectrum meltdown and they rely on some of the more advanced ways that the CPU operates
It's probably worth diving down and actually looking at how a CPU actually executes the code be right
I mean, we've touched on this before we did a video on pipelining we did a video on caching, but also delve down and see
What happens below the surface when we actually get our CPU to execute our code?
Let's start by having a simple example:
A line of code that we might want to look at what happens. Let's take a line of code that takes a variable
Let's take a line of code. It's gonna add up A plus B plus C plus D
Times e so I've written this in this sort of see like language
So we're gonna do this calculation now as I'm sure most of us are aware
When we take that and put into our C compiler run it it gets converted into the machine code that the CPU executes
so we take that client of code, and then we'd have to
Convert that into the machine code, and then the CPU
Executes that machine code so a program like this would end up looking and I'm going to use arm assembly here
Just because I know it better than the anything else perhaps for the first instruction. We would load the value for memory of a
Into registers, let's pick our zero. We've got 14 or so of them
We can use the 16 of them
but some of them get used for different things that we don't really use so although the value of a
Into our zero next thing we want to do is you want to add that to the value of B
Then after make sure we'll get the operator precedence right so we can load the value of B into a register
So let's loading the value of C
Here into another register
And we might as well do D. And E. As well so load or three come on D. And
We'll load our four
With E as well, and now we can start
Adding these things up multiplying them to produce
The actual result we want now we're going to make sure we get the precedence right
But we could either start by adding a and B together then add on C. And then
Multiply D. And E and have them together or we could do that one first
I'm just going to start going from left to right as long as the math is right
We'll get the right result so we'll add together a and B now
I put those two values in r0 and r1 and we need to store the results somewhere
We are going to need the value of a again after this, so we'll reuse the register R
0 so we're saying put into R 0
the value of R 0
plus R
1 so this is adding together storing the result in R. 0 so we now added a and B together
We want to add on C. And so we could do the same thing add
to R 0
The value in R. 0 which is now because of this instruction a plus B want to add on the value in R
2 there's now about a plus B plus C in
Our 0 now we need to do the multiplication
And we need to do that separately before we add it on so we get the right result so we'll multiply
And we'll see we've got an arm, too cheap here, so we've got the multiply instruction there
And we need to put the results on whether it's use our 5 D. Which we put in R. 3 and E
Which we put in R?
4 and then we want to add the result of that onto the value
In our 0 and now our 0 contains the result of a plus B. Plus C plus D times E. And
We could then store that
back into X
So that line of code there at one line of C code would become what 1 2 3 4 5 6 7 8 9 10
different lines an assembler and I've numbered them because I'm going to
Refer to them at different times so we can say searching one instruction 5 etc to refer to the different ones now
We might expect that our CPU will just xu instruction 1 the new instruction 2 instruction 3 instruction 405 and so on in order
To generate the result and some cpus do in fact work exactly like that, but actually if you think about
What the cpus and what these descriptions are actually doing you might think well actually?
when I get this first one I've got to go an access memory and
As we talked about in the caching video many years ago, cache is perhaps a an old-fashioned English word
but it basically just means a small place where we can store things so you might use it to store your hidden treasure if you're
a pirate or to store
Your food for winter on a modern CPU probably say around 200
Nanoseconds to actually go and get the value out of your main memory and load it into the register now of course
If these are already cached in the same bit of memory, then you may find that these all execute very quickly
We don't know that this isn't the only way we could write this program because if we take this instruction here instruction 6
Where we do the add of r0 and r1 to add up a and B. Well. We've got those two values here
They're already in the registers at this point in the program
So there's nothing to stop us moving this instruction up there
and it would still have exactly the same effect so instruction 6 could be moved to me between instructions 2 & 3
And then we do the next instruction which was the same as instruction 3 here?
which would be LDR
R to come of the values in memory that's representing the letter the variable see how exactly the same effect. We just moved that
Instruction earlier so you could rewrite this program in various different ways now
Why is that interesting?
well when we think about how a CPU is designed and that you will have
various different what impress be termed execution units within there now one of them is what's generally referred to as the
ALU or the arithmetic and logic unit and that's the bit of your CPU that does
Addition it does subtraction it does sort of logical operators and or and so on
But you also have other bits inside there
And one of the bits you'll often have in a modern CPU is it part of your CPU that handles loading and storing
Values from memory sometimes interact sometimes they don't now
Assuming that they are separate parts of the CPU if we look back at our instructions here. We execute instruction 1
It uses a load store. You need to get a value for memory we execute instruction 2
It uses the load store unit to get a value for memory instruction 3
It uses a load store unit to get a value for memory for uses the load store unit to get a value for memory
5 uses the load store unit to get a value for a memory 6
changes and uses the ALU as 2 7 8 & 9
before insertion turn uses the load store unit so we've got a pretty sequential series the first 5
instructions all execute using the load store part of the CPU the next four instructions execute using the ALU and
The final instruction again uses the load store unit but as we said we can reorder that
into this version here using instructions w x y and z
Differentiate them and we execute the first instruction instruction w uses a load store unit instruction X
Uses a load store unit instruction Y uses the ALU restrictions ed uses the load store unit
Okay, what difference does that make well let's think about what's happening when we're using the load store unit
the ALU isn't being used that part of the CV is just sitting there not being used and
When we're using the ALU the load store units sitting there not being used, that's what we saw there
But does that have to be the case could we actually design it, and you probably guess the answer is that yes?
We can so that
While the load store unit say is being used that we can run the instructions on the ALU part as well
I'd turn the paper round and I'm going to draw
This as a sort of timeline so these are our two units and we've got time running along this side as well
I'm using the computer for our paper in a
Radically different orientation, but never mind, so we're going to execute the instructions
On here and the first thing that happens is that we execute instruction W
No problem
That's going to take certain amount of time to actually that's using the load store unit to execute it
These are being fetched and decoded and sort of executed by the different execution units we then execute the next instruction
which is
X and we couldn't execute this any earlier because the load store unit was being used to execute that one so no difference than what?
We had before we're using this one after the other we now come to
execute
The add instruction now we can't execute this any earlier than this point in time
Because this depends on the value of registers r0 and r1 which aren't set?
until this point so we need those two values so we can start doing
instruction why here now actually
It's an ADD it's not going to take as long as fetching things from memory because it's all inside the CPU so we can use
A smaller box and we can put instruction Y there and this depends on the value being fetched from there
And I'm just going to show this as an arrow here, but the next instruction load
r2 comma C
well
I doesn't depend on anything except the value in there Marie and our load/store units not being used
So if we build our CPU right? There's nothing to start that
Instruction being executed at the same time and that means that actually when we come to the next instruction
Which would be which will be the best instruction to execute next in this example. Let's go back to our program
We've executed instructions one to six and three already
That's w x y&z we've rewritten the mass let's put instruction seven here
What was instruction seven and this is now going to become?
I'm gonna have to use it's gonna become instruction a I'll hopefully remember to say instruction a but
You can guess the colonics are referring to a on its own is probably the variable if not is probably the instruction so we can
now execute
instruction a and again instruction a depends on two things
It depends on the value of R. 0 which is going to come from this instruction so we have to have that ready
But it also depends on the value of R
2 which is coming from this instruction so we have to have that ready as well so it can actually happen any point before
This point in time so this would be the LDR R 2 comma dot and this is the add R
0 and this is the next add, but again we can start trying to leave more the instructions because I okay well
That's what instruction for here at the same time. We'll call this instruction B
And so on we put that at that point we can execute
Instruction B at the same time as we do way and I'm really confusing myself with pens here and so again
We've saved some time because rather than having to execute that in the same thing we can do these two things at the same time
Now to be able to do this we need these instructions need to execute on different execution units we couldn't for example
Execute to add instructions at the same time because we haven't got to Al use well, though
There's no reason why you can build a CPU with two Al use if you look at modern
CPU designs from Intel AMD arm and cetera they all have often have multiple Al used or allow you to do just that
but because the different types we can execute them at the same time and the reason we can do that is because
They don't depend
on the results of one to work out the other so they're working on different things and they're using different parts of the CPU and
The CPU that enables you to do this is what's known as a superscalar
CPU because it can run multiple instructions at the same time will you continue doing this and we'd end up we execute instruction
B then we've got to execute instruction C
instruction D
uses a
Multiply and actually on a CPU probably got a separate execution unit which does and multiplies because you can actually do them faster that way
So you have a multiply unit as well so we can execute that multiply D up there
We think well okay?
Can we do the other at the same time well no because we need the result of that as well so we can then execute
the ad down here before finally, and it just fits on the paper like that so we can actually squash things up and
we're going to save some time because if you think about it you have the original order of the program and
Here's one. I made earlier
All right, or as in I'm just about to draw and Shawn will do some very clever
Cutting so even if we had a superscalar processor. We've only got one load store unit we've only got one
Al you really got one multiply unit we wouldn't have any opportunities with this program
To run two instructions at the same time so this version of the program would
Still take ten instructions this one still takes ten instructions, but with a superscalar processor we have the opportunity to sort of
execute two instructions at the same time because they use different bits of the CPU now you need to design the CPU to allow that
but that enables us to
Speed things up a little bit because while this is working to get the value for memory. We can execute some more
instructions
Now that's all very well and superscalar processors started to appear in
the mid 90s things like the six eight thousand and sixty the
Pentium I think was superscalar
But they require the code to be written in a way
That enables this to happen so this program wouldn't have been able to do anything
This one would but as we said when we were developing this we could work out which
Instructions we could move around to get that speed up based on
What those instructions depended on so this instruction?
We said what what six became why only depended on the values of R
0 and R 1 which has been set by instructions 1 & 2 so we can move that earlier
Without affecting anything in our program because it only depended on those 2 values
so we can either do this in the compiler or by hand if you write in the assembly yourself like we just did here or
It's possible to let the CPU work it out, and so what a modern CPU does what's called an out of water CPU is
Reorders the instructions without
supposedly breaking the rules of
What each instruction does so it'll still execute it as if it was written like this?
And it won't change break any of the rules of that
but it will say well hang on it will spot that this instruction could happen earlier and
So move it earlier to gain some of that parallelism in fact then execute them together at the same time
And that works generally get well
But as we saw with things like Spector and meltdown if you allow things to happen too far earlier and start doing what's called speculative
Evaluation where you say okay?
I've got the stuff. I need to execute it now
I don't if I need the result but I might do so I'll execute it anyway, and then if I need it
I've already done it and if I don't need it while I was still waiting for this to come in anyway
So it doesn't matter that I've done it. I've not lost
Any time well
Then it's turned out that you can have sort of side channels where you can sort of see that that's happened or not
Which is caused a few issues with computing?
It goes along here like this
Intersects the curve somewhere else flips over and it's over here, so this is for G
Now we won't look at any more right the edge of a formula for this is just
mathematics to do with lines and the tangent of this curve
It's actually not very complicated the point is that what we're doing is by multiplying G
    You must  Log in  to get the function.
Tip: Click on the article or the word in the subtitle to get translation quickly!

Loading…

CPUs Are Out of Order - Computerphile

2 Folder Collection
林宜悉 published on March 28, 2020
More Recommended Videos
  1. 1. Search word

    Select word on the caption to look it up in the dictionary!

  2. 2. Repeat single sentence

    Repeat the same sentence to enhance listening ability

  3. 3. Shortcut

    Shortcut!

  4. 4. Close caption

    Close the English caption

  5. 5. Embed

    Embed the video to your blog

  6. 6. Unfold

    Hide right panel

  1. Listening Quiz

    Listening Quiz!

  1. Click to open your notebook

  1. UrbanDictionary 俚語字典整合查詢。一般字典查詢不到你滿意的解譯,不妨使用「俚語字典」,或許會讓你有滿意的答案喔