Placeholder Image

Subtitles section Play video

  • There's been some noise over the past week about and a paper that's come out and an exploit the papers called port

  • contention for fun and profit people be referring it Port Smash. So what it does is it actually

  • you got open ssl running and it's using a private key and you've got another program which they call that spy program which runs alongside

  • It and is able to extract the private key from the open ssl program even though it shouldn't be able to do that

  • So I thought it was interesting to have a little chat about the way it's exploiting the cpu so again like

  • spectrum meltdown and quite a few of the exploits that have turned up over the past year its

  • exploiting the fact that people have tried to make the CPUs run faster and faster and sort of squeeze

  • every last ounce of speed out of the actual cpu technology that's there and

  • what this is specifically targeting is what's put into most intel cpus and AMD

  • which is hyper threading. So what is hyper threading well normally when we think about

  • a computer system we have a cpu in there and

  • originally that CPU would execute one single stream of instructions and process data with them

  • you could have two CPUs in there's got some

  • Multiprocessor system or a multi-core system depending on how you wire them up and then you could have two separate streams of instructions

  • being executed and

  • the way that those CPUs are designed is

  • you have three stages that each instruction has to sort of go through that in the cpu that's for them it's a smaller stage but

  • We can think about this of three broad stages we have to sort of fetch the instruction from memory

  • then we decode it to work out what we actually wanted to do and then we execute it and

  • To make the cpu run as fast as possible then you end up with various

  • execution units in your cpu which do various things there might be an algorithmic and logic unit which will do addition and subtraction and various

  • logical operations. There might be bits that can load and store

  • values from memory. There might be bits that can do various other sorts of calculations multiplications and so on address

  • calculations floating point operations vector

  • processing and so on so you have lots of these

  • execution units in your machine and one of the things you got was sort of a superscalar architecture where you'd

  • fetch two instructions and execute them at the same time

  • providing that they were using different parts that you could sort of fetch a value from memory while adding a value onto another

  • register as long as they're using separate registers and so on. So the idea is you've got if we sort of draw a

  • picture you've got some sort of logic here which we'll call decode and you've got going into that a stream of

  • instructions coming from memory. So you're feeding them in there and this is actually breaking them up into a series what

  • of what we call micro operations that do different things, so one

  • x86 instruction may get broken up into

  • multiple micro operations for example to load a value from memory add that value onto a value in a register and store that result

  • out back into the same memory location it's all three operations so it gets split so which use different

  • execution that operations. Some have to happen sequentially some can be done in parallel depending on what you're doing

  • So we end up with a series of execution operations - so let's say we've got an ALU and

  • We might have say a division unit in there

  • We might have another one with an ALU it might have some things to do - vector type stuff

  • we've got another one which has got another ALU and a multiplication unit on there and

  • there's various ports that these are connected to -- so you've got a sort of port

  • One here which connects to this set of operations

  • Port two will say here and this is a generalized version which is connected to these operations

  • Q:Are these physical ports like physical wires?

  • Erm they'll be parts with inside the CPU so the way that things are connected up... and this block is a sort of

  • scheduler which is getting the decoded

  • micro-ops from this section and

  • sort of sending them to the right ports as they're being ... as they're available and so on to cause the right operations to happen in

  • the best order to make most use of the system. You'd have a few more over here that says this has got a load port

  • And so on so what you can do is you can start pulling the multiple instructions here and as long as they're not depending on

  • values that previous instructions have created

  • and haven't completed yet then you can sort of schedule them on different

  • parts the unit - so if you had one

  • instruction which adds value one on to EAX you could put it on to this port the next insert is adding something onto B EBX

  • You could put it onto that port (they're registers within the CPU) and they could execute at the same time. But the problem

  • you've got is that

  • sometimes you get a sequence of instructions which either a

  • sequential so you add one to a value in a register then you multiply that register by two

  • And then so on - you've got to execute them and things and so you can't always make full use of

  • your

  • Available

  • execution units down here in the CPU

  • So the idea which happened many

  • many years ago and sort of fell out of favor and then was brought back with the Pentium 4 in the mid

  • 2000s and has existed through on various CPUs both from AMD and

  • Intel is hyperthreading - you say well ok this is only a single core but let's make it present itself as

  • if it was two cores

  • Two logical cores we've got one physical core with one set of execution units but we have it appear to the operating system as two

  • logical cores so the operating system can have two - as far as its concerned two - independent bits programs threads whatever

  • Executing on there and so they'll be two streams of instructions executing and so we'd have another

  • stream of

  • instructions coming in to the decode logic and then

  • the CPUs got a better chance of keeping things running at the same time because you can either run an instruction from here

  • But if you can't schedule that it might be out of scheduled instruction from the other stream of instructions. You may get some interesting

  • things so for example on this one that we've drawn we've only got one

  • multiplier we've only got one load and store unit. If we have both of these trying to do a

  • multiply then one will have to wait for

  • the other to complete and the sort of way that CPU might do that it's a sort of round-robin that the first

  • clock cycle this one gets the multiply on the second clock cycle that one will get the multiply and so on. So that's the basic

  • idea behind hyperthreading - you've got two

  • logical processors that are used by the operations to schedule the jobs on your computer

  • but they're executed by one physical core on the CPU.

  • Q: So hyper threading is different to multi-threading?

  • So multi threading is the idea that you split your program or your programs into multiple threads of operation

  • and then they get scheduled either by the operating system on to different

  • CPU cores if you've got multiple ones or onto one single core by sort of executing a bit of

  • thread one than a bit of thread two you than a bit of thread three

  • effectively like you could watch multiple programs on YouTube at once by chopping between the different programs and watching sort of bits after the other

  • Be quite garbled watching multiple computer files in that sort of way. So unlike a normal photograph/In a very basic sense if you've got/

  • Bletchley Park/So that's a way of doing things in software and programming/yeah

  • It's/hyper threading is a bit more Hardware So the idea is there, okay well you've got these different threads of execution

  • okay if you've got multiple

  • Cores multiple processing units then you can schedule your each of those threads onto

  • Each of the cores and have them executing at the same time

  • but a few limitations on access to memory and things because and so on

  • With hyper threading you say okay we'll have the idea we got two

  • threads of execution

  • happening at the same time

  • But we've actually only got one physical set of units to do it so it's the hardware that's doing the scheduling because it can

  • do a finer grain than the operating system can. The operating system is still scheduling across those two

  • logical cores but the hardware can then say well actually

  • this one is trying to multiply this is trying to add I can run them at the same time

  • whereas this is trying to

  • Multiply and this is trying to multiply I need to sequence it so it can actually start to do a finer grain

  • sort of threading operation and sort of

  • knit them together

  • Q: So where's the problem come in then? So the problem comes in the

  • let's say we've got a program where we want to find some information about what it's doing and let's say this program here

  • we want to know what sort of instructions it's executing well what we could do for example

  • Is if we wanted to find out if it was executing multiply instructions on the example we've got here we've only got one

  • multiply unit so if this is

  • Trying to execute multiple instructions and this is trying to execute multiply instructions then they're going to have to take turns to execute

  • those multiply instruction on the other and if the one we're trying to find out on isn't executing multiply instructions then

  • This one will be able to execute multiple instructions one after the other so what the port smash paper have done is

  • that they've written their program that will

  • execute certain types of instructions in a loop so they have a repetition of about 64 let's say it's

  • these various different ones but so is the 64 add instructions to make use of all the ALUs on Intel CPU - there's four of

  • them that it can make use of

  • say just four

  • continuous adds we should all exceute at the same time if nothing else was running on that CPU and it times how long they

  • take to execute

  • It does that and it gets an idea of how long they take to execute and then you run the same thing at the same

  • Time as the other program is running and if it takes more time to execute

  • than the other program then you know that program must be also executing some add instructions and

  • So what you can do is by looking at which of these

  • bits are being used by running instructions then you can find out what type of instructions are being executed

  • on the other side

  • Now the reason why it's called port smash is because

  • We've drawn this a time one multiply it but that's also on the same part as an ALU

  • for example and what they actually do is that these are all connected to one

  • port of the scheduler within the CPU and so if we wanted to say use the multiply bit

  • of this CPU then we have to run out of port 2 which means the ALU on port 2 can't be used as well

  • can use one of the things in

  • this column same for example here if we want to use a divide we can't do any ALU processing or vector processing

  • so we could run instructions that we know will tie up one of these specific ports or will tie up a group of them and

  • Then we can see whether the other program providing we can get it scheduled onto the same physical execution unit which isn't

  • Impossible to do is also trying to use parts of the system on that point what the port smash

  • example program does is cleverly uses certain instructions which tie up a particular port on the

  • CPU core

  • To see whether that one is being used by the other program and by measuring the time we can see whether

  • That has been done so we've got this side channel where we can see

  • We can get insight into what the other process is doing as a black box we say ok it must be trying to execute this

  • type of instructions because it's interfering with our use of this port or it isn't

  • interfering with this use of this port. So what they do is that they run this alongside

  • OpenSSL doing its encryption of the task that's been set to do and it can measure what type of instructions it's trying to execute

  • What it ends up with is a series of

  • timing sequences that shows how long things are taking at particular points or sometimes it be running it full-speed some points it'll be running slower

  • and that gives it what they call a noisy signal which some signal processing they apply to it they can use to actually extract

  • the private key that was being used by open SSL purely from watching the timings that are going there. So what they've demonstrated is that

  • by running a program they can sort of monitor enough information because they can see what the other CPU is doing

  • by what their program is doing Ie if the other program is trying to multiply at the same time as they're trying to multiply and

  • there's only one multiply unit that it will slow both programs down and you can detect that

  • They can start to work out what operations the other program must be doing and then start to work out what that would

  • mean in terms of what that program is doing and backtrack from that to actually extract information that

  • ideally they shouldn't be able to access

  • So the upshot of this is that one of the recommendations is that perhaps in certain circumstances you might want to turn off

  • hyper-threading either completely and just go back to having four physical cores that only execute for separate threads rather than four physical cores

  • executing eight logical threads or the very least modify things so that the operating system has the ability to turn

  • hyper-threading on and off on each processor core

  • depending on what process is running on this because for some processes it doesn't matter and extracting information from it wouldn't be that

  • important but from others use of encryption programs you really don't want this sort of side channel there.

  • Q:Is this operating system specific

  • or is this

  • what's the deal there then?

  • It's not operating system specific it will be

  • CPU specific so the example they've got is for the Intel skylake in KB Lake

  • CPU families you could probably do something similar with other CPUs that implement hyper threading

  • You would have to calibrate your system depending on that but that's not a problem

  • It's not implementation specific you just have to tailor it to the machine are you looking at.

  • Q:Is it a practical thing

  • for hackers to do this? Is it easy or them to do?

  • The example codes there you can download it off github run it and Det run the demo on a Linux machine I don't have one

  • with the right sort of CPU here to

  • Demo it unfortunately there is potential to do this there are

  • limitations on what you can do with it you need to have your spy program running on the same physical core as the

  • Other program otherwise you won't have full access to the information

  • I'm sure in the right circumstances you could use it to get information out if it hasn't already been done, so

  • if we hit this

  • Boom it goes off and sets a few things up the screen goes black but if I switch back to my other one, I type

  • su again

  • it's logged me in as root and of course

There's been some noise over the past week about and a paper that's come out and an exploit the papers called port

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it