Placeholder Image

Subtitles section Play video

  • This is Lee Sedol.

  • Lee Sedol is one of the world's greatest Go players,

  • and he's having what my friends in Silicon Valley call

  • a "Holy Cow" moment --

  • (Laughter)

  • a moment where we realize

  • that AI is actually progressing a lot faster than we expected.

  • So humans have lost on the Go board. What about the real world?

  • Well, the real world is much bigger,

  • much more complicated than the Go board.

  • It's a lot less visible,

  • but it's still a decision problem.

  • And if we think about some of the technologies

  • that are coming down the pike ...

  • Noriko [Arai] mentioned that reading is not yet happening in machines,

  • at least with understanding.

  • But that will happen,

  • and when that happens,

  • very soon afterwards,

  • machines will have read everything that the human race has ever written.

  • And that will enable machines,

  • along with the ability to look further ahead than humans can,

  • as we've already seen in Go,

  • if they also have access to more information,

  • they'll be able to make better decisions in the real world than we can.

  • So is that a good thing?

  • Well, I hope so.

  • Our entire civilization, everything that we value,

  • is based on our intelligence.

  • And if we had access to a lot more intelligence,

  • then there's really no limit to what the human race can do.

  • And I think this could be, as some people have described it,

  • the biggest event in human history.

  • So why are people saying things like this,

  • that AI might spell the end of the human race?

  • Is this a new thing?

  • Is it just Elon Musk and Bill Gates and Stephen Hawking?

  • Actually, no. This idea has been around for a while.

  • Here's a quotation:

  • "Even if we could keep the machines in a subservient position,

  • for instance, by turning off the power at strategic moments" --

  • and I'll come back to that "turning off the power" idea later on --

  • "we should, as a species, feel greatly humbled."

  • So who said this? This is Alan Turing in 1951.

  • Alan Turing, as you know, is the father of computer science

  • and in many ways, the father of AI as well.

  • So if we think about this problem,

  • the problem of creating something more intelligent than your own species,

  • we might call this "the gorilla problem,"

  • because gorillas' ancestors did this a few million years ago,

  • and now we can ask the gorillas:

  • Was this a good idea?

  • So here they are having a meeting to discuss whether it was a good idea,

  • and after a little while, they conclude, no,

  • this was a terrible idea.

  • Our species is in dire straits.

  • In fact, you can see the existential sadness in their eyes.

  • (Laughter)

  • So this queasy feeling that making something smarter than your own species

  • is maybe not a good idea --

  • what can we do about that?

  • Well, really nothing, except stop doing AI,

  • and because of all the benefits that I mentioned

  • and because I'm an AI researcher,

  • I'm not having that.

  • I actually want to be able to keep doing AI.

  • So we actually need to nail down the problem a bit more.

  • What exactly is the problem?

  • Why is better AI possibly a catastrophe?

  • So here's another quotation:

  • "We had better be quite sure that the purpose put into the machine

  • is the purpose which we really desire."

  • This was said by Norbert Wiener in 1960,

  • shortly after he watched one of the very early learning systems

  • learn to play checkers better than its creator.

  • But this could equally have been said

  • by King Midas.

  • King Midas said, "I want everything I touch to turn to gold,"

  • and he got exactly what he asked for.

  • That was the purpose that he put into the machine,

  • so to speak,

  • and then his food and his drink and his relatives turned to gold

  • and he died in misery and starvation.

  • So we'll call this "the King Midas problem"

  • of stating an objective which is not, in fact,

  • truly aligned with what we want.

  • In modern terms, we call this "the value alignment problem."

  • Putting in the wrong objective is not the only part of the problem.

  • There's another part.

  • If you put an objective into a machine,

  • even something as simple as, "Fetch the coffee,"

  • the machine says to itself,

  • "Well, how might I fail to fetch the coffee?

  • Someone might switch me off.

  • OK, I have to take steps to prevent that.

  • I will disable my 'off' switch.

  • I will do anything to defend myself against interference

  • with this objective that I have been given."

  • So this single-minded pursuit

  • in a very defensive mode of an objective that is, in fact,

  • not aligned with the true objectives of the human race --

  • that's the problem that we face.

  • And in fact, that's the high-value takeaway from this talk.

  • If you want to remember one thing,

  • it's that you can't fetch the coffee if you're dead.

  • (Laughter)

  • It's very simple. Just remember that. Repeat it to yourself three times a day.

  • (Laughter)

  • And in fact, this is exactly the plot

  • of "2001: [A Space Odyssey]"

  • HAL has an objective, a mission,

  • which is not aligned with the objectives of the humans,

  • and that leads to this conflict.

  • Now fortunately, HAL is not superintelligent.

  • He's pretty smart, but eventually Dave outwits him

  • and manages to switch him off.

  • But we might not be so lucky.

  • So what are we going to do?

  • I'm trying to redefine AI

  • to get away from this classical notion

  • of machines that intelligently pursue objectives.

  • There are three principles involved.

  • The first one is a principle of altruism, if you like,

  • that the robot's only objective

  • is to maximize the realization of human objectives,

  • of human values.

  • And by values here I don't mean touchy-feely, goody-goody values.

  • I just mean whatever it is that the human would prefer

  • their life to be like.

  • And so this actually violates Asimov's law

  • that the robot has to protect its own existence.

  • It has no interest in preserving its existence whatsoever.

  • The second law is a law of humility, if you like.

  • And this turns out to be really important to make robots safe.

  • It says that the robot does not know

  • what those human values are,

  • so it has to maximize them, but it doesn't know what they are.

  • And that avoids this problem of single-minded pursuit

  • of an objective.

  • This uncertainty turns out to be crucial.

  • Now, in order to be useful to us,

  • it has to have some idea of what we want.

  • It obtains that information primarily by observation of human choices,

  • so our own choices reveal information

  • about what it is that we prefer our lives to be like.

  • So those are the three principles.

  • Let's see how that applies to this question of:

  • "Can you switch the machine off?" as Turing suggested.

  • So here's a PR2 robot.

  • This is one that we have in our lab,

  • and it has a big red "off" switch right on the back.

  • The question is: Is it going to let you switch it off?

  • If we do it the classical way,

  • we give it the objective of, "Fetch the coffee, I must fetch the coffee,

  • I can't fetch the coffee if I'm dead,"

  • so obviously the PR2 has been listening to my talk,

  • and so it says, therefore, "I must disable my 'off' switch,

  • and probably taser all the other people in Starbucks

  • who might interfere with me."

  • (Laughter)

  • So this seems to be inevitable, right?

  • This kind of failure mode seems to be inevitable,

  • and it follows from having a concrete, definite objective.

  • So what happens if the machine is uncertain about the objective?

  • Well, it reasons in a different way.

  • It says, "OK, the human might switch me off,

  • but only if I'm doing something wrong.

  • Well, I don't really know what wrong is,

  • but I know that I don't want to do it."

  • So that's the first and second principles right there.

  • "So I should let the human switch me off."

  • And in fact you can calculate the incentive that the robot has

  • to allow the human to switch it off,

  • and it's directly tied to the degree

  • of uncertainty about the underlying objective.

  • And then when the machine is switched off,

  • that third principle comes into play.

  • It learns something about the objectives it should be pursuing,

  • because it learns that what it did wasn't right.

  • In fact, we can, with suitable use of Greek symbols,

  • as mathematicians usually do,

  • we can actually prove a theorem

  • that says that such a robot is provably beneficial to the human.

  • You are provably better off with a machine that's designed in this way

  • than without it.

  • So this is a very simple example, but this is the first step

  • in what we're trying to do with human-compatible AI.

  • Now, this third principle,

  • I think is the one that you're probably scratching your head over.

  • You're probably thinking, "Well, you know, I behave badly.

  • I don't want my robot to behave like me.

  • I sneak down in the middle of the night and take stuff from the fridge.

  • I do this and that."

  • There's all kinds of things you don't want the robot doing.

  • But in fact, it doesn't quite work that way.

  • Just because you behave badly

  • doesn't mean the robot is going to copy your behavior.

  • It's going to understand your motivations and maybe help you resist them,

  • if appropriate.

  • But it's still difficult.

  • What we're trying to do, in fact,

  • is to allow machines to predict for any person and for any possible life

  • that they could live,

  • and the lives of everybody else:

  • Which would they prefer?

  • And there are many, many difficulties involved in doing this;

  • I don't expect that this is going to get solved very quickly.

  • The real difficulties, in fact, are us.

  • As I have already mentioned, we behave badly.

  • In fact, some of us are downright nasty.

  • Now the robot, as I said, doesn't have to copy the behavior.

  • The robot does not have any objective of its own.

  • It's purely altruistic.

  • And it's not designed just to satisfy the desires of one person, the user,

  • but in fact it has to respect the preferences of everybody.

  • So it can deal with a certain amount of nastiness,

  • and it can even understand that your nastiness, for example,

  • you may take bribes as a passport official

  • because you need to feed your family and send your kids to school.

  • It can understand that; it doesn't mean it's going to steal.

  • In fact, it'll just help you send your kids to school.

  • We are also computationally limited.

  • Lee Sedol is a brilliant Go player,

  • but he still lost.

  • So if we look at his actions, he took an action that lost the game.

  • That doesn't mean he wanted to lose.

  • So to understand his behavior,

  • we actually have to invert through a model of human cognition

  • that includes our computational limitations -- a very complicated model.

  • But it's still something that we can work on understanding.

  • Probably the most difficult part, from my point of view as an AI researcher,

  • is the fact that there are lots of us,

  • and so the machine has to somehow trade off, weigh up the preferences

  • of many different people,

  • and there are different ways to do that.

  • Economists, sociologists, moral philosophers have understood that,

  • and we are actively looking for collaboration.

  • Let's have a look and see what happens when you get that wrong.

  • So you can have a conversation, for example,

  • with your intelligent personal assistant

  • that might be available in a few years' time.

  • Think of a Siri on steroids.

  • So Siri says, "Your wife called to remind you about dinner tonight."

  • And of course, you've forgotten. "What? What dinner?

  • What are you talking about?"

  • "Uh, your 20th anniversary at 7pm."

  • "I can't do that. I'm meeting with the secretary-general at 7:30.

  • How could this have happened?"

  • "Well, I did warn you, but you overrode my recommendation."

  • "Well, what am I going to do? I can't just tell him I'm too busy."

  • "Don't worry. I arranged for his plane to be delayed."

  • (Laughter)