Placeholder Image

Subtitles section Play video

  • MALE SPEAKER: Welcome, everybody,

  • to one more Authors at Google Talk.

  • Today, our guest speaker is Pedro Domingos,

  • whose new book is called "The Master Algorithm."

  • We have it here and you can buy copies outside.

  • So one definition of machine learning

  • is "the automation of discovery."

  • Our guest, Pedro Domingos, is at the very forefront

  • of the search for the master algorithm, a universal learner

  • capable of deriving all knowledge, past, present

  • and future, from data.

  • Pedro Domingos is a professor of Computer Science

  • and Engineering at the University of Washington.

  • He's the co-founder of the International Machine Learning

  • Society.

  • Pedro received his MS in Electrical Engineering

  • and Computer Science from IST in Lisbon,

  • his Master's of Science and PhD in Information

  • and Computer Science from the University of California

  • at Irvine.

  • He spent two years as an assistant professor at IST

  • before joining the faculty of the University of Washington

  • in 1999.

  • Pedro is the author or co-author of over 200

  • technical publications in machine learning, data mining,

  • and other areas.

  • He is the winner of the SIGKDD Innovation Award, the highest

  • honor in data science.

  • He's an AAAI Fellow and has received the Sloan Fellowship

  • and NSF Career Award, a Fulbright scholarship, an IBM

  • Faculty Award, several best paper awards,

  • and other distinctions.

  • He's a member of the editorial board of "The Machine Learning

  • Journal."

  • Please join me in welcoming Pedro, today, to Google.

  • [APPLAUSE]

  • PEDRO DOMINGOS: Thank you.

  • Let me start with a very simple question--

  • where does knowledge come from?

  • Until very recently, it came from just three sources, number

  • one, evolution-- that's the knowledge that's

  • encoded in your DNA-- number two,

  • experience-- that's the knowledge that's

  • encoded in your neurons-- and number three, culture,

  • which is the knowledge you acquire

  • by talking with other people, reading books, and so on.

  • And everything that we do, right,

  • everything that we are basically comes from these three sources

  • of knowledge.

  • Now what's quite extraordinary is just, only recently,

  • there's a fourth source of knowledge on the planet.

  • And that's computers.

  • There's more and more knowledge now that comes from computers,

  • is discovered by computers.

  • And this is as big of a change as the emergence

  • of each of these four was.

  • Like evolution, right, well, that's life on earth.

  • It's the product of evolution.

  • Experience is what distinguishes us mammals from insects.

  • And culture is what makes humans what we are

  • and as successful as we are.

  • Notice, also, that each of these forms of knowledge discovery

  • is orders of magnitude faster than the previous one

  • and discovers orders of magnitude more knowledge.

  • And indeed, the same thing is true of computers.

  • Computers can discover knowledge orders of magnitude

  • faster than any of these things that went before

  • and that co-exist with them and orders of magnitude more

  • knowledge in the same amount of time.

  • In fact, Yann LeCun says that "most

  • of the knowledge in the world in the future

  • is going to be extracted by machines

  • and will reside in machines."

  • So this is a major change that, I think, is not just for us

  • computer scientists to know about and deal

  • with, it's actually something that everybody

  • needs to understand.

  • So how do computers discover new knowledge?

  • This is, of course, the province of machine learning.

  • And in a way, what I'm going to try to do in this talk

  • is try to give you a sense of what machine learning is

  • and what it does.

  • If you're already familiar with machine learning,

  • this will hopefully give you a different perspective on it.

  • If you're not familiar with machine learning already,

  • this should be quite fascinating and interesting.

  • So there are five main paradigms in machine learning.

  • And I will talk about each one of them in turn

  • and then try to step back and see, what is the big picture

  • and what is this idea of the master algorithm.

  • The first way computers discover knowledge

  • is by filling gaps in existing knowledge.

  • Pretty much the same way that scientists work, right?

  • You make observations, you hypothesize

  • theories to explain them, and then

  • you see where they fall short.

  • And then you adapt them, or throw them away

  • and try new ones, and so on.

  • So this is one.

  • Another one is to emulate the brain.

  • Right?

  • The greatest learning machine on earth

  • is the one inside your skull, so let's reverse engineer it.

  • Third one is to simulate evolution.

  • Evolution, by some standards, is actually an even greater

  • learning algorithm than your brain

  • is, because, first of all, it made your brain.

  • It also made your body.

  • And it also made every other life form on Earth.

  • So maybe that's something worth figuring out how it works

  • and doing it with computers.

  • Here's another one.

  • And this is to realize that all the knowledge that you learn

  • is necessarily uncertain.

  • Right?

  • When something is induced from data,

  • you're never quite sure about it.

  • So the way to learn is to quantify that uncertainty using

  • probability.

  • And then as you see more evidence,

  • the probability of different hypotheses evolves.

  • Right?

  • And there's an optimal way to do this using Bayes' theorem.

  • And that's what this approach is.

  • Finally, the last approach, in some ways,

  • is actually the simplest and maybe even the most intuitive.

  • It's actually to just reason by analogy.

  • There's a lot of evidence in psychology

  • that humans do this all the time.

  • You're faced with a new situation,

  • you try to find a matching situation in your experience,

  • and then you transfer the solution

  • from the situation that you already

  • know to the new situation that you're faced with.

  • And connected with each of these approaches to learning,

  • there is a school of thought in machine learning.

  • So the five main ones are the Symbolists, Connectionists,

  • Evolutionaries, Bayesians, and Analogizers.

  • The Symbolists are the people who

  • believe in discovering new knowledge

  • by filling in the gaps in the knowledge

  • that you already have.

  • One of the things that's fascinating about machine

  • learning is that the ideas in the algorithms

  • come from all of these different fields.

  • So for example, the Symbolists, they have their origins

  • in logic, philosophy.

  • And they're, in some sense, the most "computer-sciency"

  • of the five tribes.

  • The Connectionists, their origins

  • are, of course, in neuroscience, because they're

  • trying to take inspiration from how the brain works.

  • The Evolutionaries, well, their origins

  • are, of course, in evolutionary biology,

  • in the algorithm of evolution.

  • The Bayesians come from statistics.

  • The Analogizers actually have influences

  • from a lot of different fields, but probably the single most

  • important one is psychology.

  • So in addition to being very important for our lives,

  • machine learning is also a fascinating thing,

  • I think, to study, because in the process of studying machine

  • learning, you can actually study all of these different things.

  • Now each of these "tribes" of machine learning, if you will,

  • has its own master algorithm, meaning its own general purpose

  • learner that, in principle, can be used to learn anything.

  • In fact, each of these master algorithms

  • has a mathematical proof that says,

  • if you give it enough data, it can learn anything.

  • OK?

  • For the Symbolists, the master algorithm is inverse deduction.

  • And we'll see, in a second, what that is.

  • For the Connectionists, it's backpropagation.

  • For the Evolutionaries, it's genetic programming.

  • For the Bayesians, it's probabilistic inference

  • using Bayes' theorem.

  • And for the Analogizers, it's kernel machines, also known

  • as support vector machines.

  • OK?

  • So let's see what just the key ideas in each one of these are.

  • So the Symbolists-- here are some

  • of the most prominent Symbolists in the world.

  • There's Tom Mitchell at Carnegie Mellon,

  • Steve Muggleton in the UK, and Russ Quinlan in Australia.

  • And their idea is actually a very interesting one.

  • It's to think of deduction-- sorry,

  • it's to think of learning as being the inverse of deduction.

  • Learning is induction of knowledge.

  • Right?

  • Deduction is going from general rules to specific facts.

  • Induction is the opposite.

  • It's going from specific facts to general rules.

  • So in some sense, one is the inverse of the other.

  • And so maybe we can figure out how

  • to do induction in the same way that people in mathematics

  • figure out how to do other inverse operations.

  • Like, for example, subtraction is the inverse of addition,

  • or integration is the inverse of differentiation, and so forth.

  • So as a very, very simple example,

  • addition gives you the answer to the question,

  • if I add 2 and 2, what do I get.

  • Subtraction-- and the answer, of course, is 4.

  • And this is the deepest thing I'll say in this whole talk.

  • And subtraction, of course, gives you the answer

  • to the inverse question, which is, what do I

  • need to add to 2 in order to get 4, the answer, of course,

  • being 2.

  • Now inverse deduction works in a very similar way.

  • So here's a simple example of deduction.

  • You know that Socrates is human and you

  • know that humans are mortal.

  • And the question is, what can you infer from that.

  • Well, of course, the answer is that, from that,

  • you can infer that Socrates, too, is mortal.

  • Now the inverse of this-- and that's

  • when it becomes induction-- is, if I know that Socrates

  • is human, what else do I need to know in order to be

  • able to infer that he's mortal.

  • And of course, what I need to know is that humans are mortal.

  • And so in this way, I have just introduced a new general rule

  • that humans are mortal.

  • Of course, in general, I wouldn't just

  • do it from Socrates, I would do it

  • from Socrates and a bunch of other people.

  • But that's the general way that this works.

  • OK?

  • And then once I've induced rules like this,

  • I can now combine them in all sorts of different ways

  • to answer questions that I may never even have thought of.

  • And this kind of flexibility and composition

  • is actually something that, of all the five tribes,

  • only the Symbolists have.

  • Now of course, these examples are in English

  • and computers don't understand natural language yet.

  • So what they use is something like first order logic.

  • So these things, both the facts and the rules

  • that are discovered, are represented

  • in first order logic.

  • And then questions are answered by chaining those rules,

  • by reasoning with them.

  • OK?

  • But whether it's in logic or natural language,

  • the principle is the same.

  • OK?

  • And as I said, of all the five paradigms,

  • this is the one that is most like scientists at work.

  • Right?

  • They figure out, where are the gaps in my knowledge.

  • Let me enunciate a general principle

  • that will fill that gap.

  • And then let me see what follows.

  • Let me see if it's correct given the data.

  • Let me see what gaps I identified and so on.

  • And in fact, one of the most amazing applications

  • of inverse deduction to date is actually a robot scientist.

  • So if you look at this picture, the biologist

  • is not the guy in the lab coat.

  • The guy in the lab coat is actually

  • a computer scientist and machine learning researcher

  • by the name of Ross King.

  • The biologist in this picture is this machine here.

  • This machine is a complete, automated biologist.

  • It starts out with basic knowledge

  • of molecular biology, DNA, proteins, RNA,

  • and all of that stuff.

  • And then what it actually does is

  • it formulates hypotheses using inverse deduction.

  • It designs experiments to test this hypothesis using things

  • like DNA sequences and whatnot.

  • That's what's on there.

  • It physically carries them out.

  • So it does the whole process with no human help.

  • And then, given the results, it refines the hypotheses,

  • or comes up with new ones, and so on.

  • OK?

  • Now there's only one of these robots in the world today.

  • Its name is Eve.

  • There was actually a previous one called Adam.

  • And Eve, last year, discovered a new malaria drug.

  • And the thing that's amazing about this

  • is that, well, once you've made one robot scientist like this,

  • there's nothing stopping you from making a million.

  • And I have a million scientists working on a problem

  • where, before, maybe all we could afford was a few.

  • So this can really speed up the progress of science.

  • And I would say that in areas like molecular biology,

  • there is no hope of really understanding, for example,

  • very well, how cells work without this kind of thing.

  • Right?

  • There's just too much information for human beings

  • to discover on their own.

  • OK?

  • Now this is one very, I think, attractive way

  • to do machine learning, but of course, it's not the only one.

  • In particular, the Connectionists

  • are very skeptical of it.

  • They say, well, this is too abstract.

  • It's too clean.

  • It's too rigid.

  • Logic is not how the real world works.

  • You have to do things more like human beings do.

  • And of course, the way human beings

  • do things is with their brains, so let's figure out

  • how brains work, at least enough that we can take inspiration

  • from them and build algorithms based on that.

  • So this is what the Connectionists do.

  • The most prominent Connectionist is Geoff Hinton.

  • He actually started out as a psychologist 40 years ago

  • and now he's really more of a computer scientist.

  • And he's been incredibly determined

  • in his goal of understanding how the brain learns.

  • In fact, he tells the story that one day he

  • came home from work very excited,

  • saying, yea, I've done it.

  • I've figured out how the brain works.

  • And his daughter replied, oh dad, not again.

  • And he's the first one to say that he's had

  • some successes and failures.

  • But I think the bottom line is that, these days-- for example,

  • he's one of the co-inventors, of course,

  • of backpropagation-- the successes far

  • outweigh the failures.

  • So definitely, the long-term research agenda

  • has already given a lot of fruits.

  • And it's going to give more.

  • And two other prominent Connectionists

  • are Yann LeCun and [INAUDIBLE].

  • Of course, Connectionism is also known

  • as neural networks and, these days, as deep learning.

  • But it's really all the same paradigm.

  • OK?

  • So how does it all work?

  • Well, what we're going to do is we're

  • going to build a model, a mathematical model of how

  • a single neuron works.

  • We're going to make it as simple as we can provided

  • it's enough to learn and to do the inferences that we

  • need it to do.

  • OK?

  • And then we're going to put these models of neurons

  • together into big networks.

  • And then we're going to train those networks.

  • OK?

  • And at the end of the day, what we have

  • is, in some sense, a miniature brain,

  • if you will-- of course, much simpler than the real one,

  • but hopefully with some of the same properties.

  • OK?

  • So now a neuron is a very interesting kind of cell.

  • All right?

  • It's a cell that actually looks like a tree.

  • There's the cell body, and then there's

  • the trunk of the tree is what is called the axon,

  • and then the branches are called dendrites.

  • But where neurons get very different from trees

  • is that the branches of one neuron

  • actually connect back with the roots of another-- or in fact,

  • with the roots of many others.

  • And that's how you get a big network of neurons.

  • And where the dendrites of one neuron

  • join the dendrites of another, that's called the synapse.

  • And to the best of our knowledge,

  • the way humans learn everything you

  • know is encoded in the strings of the synapses

  • between your neurons.

  • OK?

  • So if the synapse is strong, then--

  • let me backtrack for just a second.

  • The way this works is that the neurons

  • communicate via electric discharges down their axons.

  • They're literally an electric discharge

  • called an action potential.

  • And what happens is that if you look at all the charge coming

  • into a neuron through its various synapses,

  • if it exceeds a certain threshold, then that neuron

  • itself fires.

  • And of course, then it sends currents

  • to the neurons downstream.

  • OK?

  • And the synaptic process itself involves chemistry and whatnot,

  • but those details are not important for us.

  • OK?

  • So what's going to happen is that the learning basically

  • happens when a neuron helps to make another neuron fire.

  • And then the strength of the connection goes up.

  • This is called Hebb's Rule.

  • And as far as we know, this is how

  • all our knowledge is encoded is in how strong the synapses are.

  • If the neurons fire together a lot,

  • then the synapses become stronger

  • and it becomes easier for the first neuron

  • to fire the second one.

  • So this is the basic idea.

  • Now what we have to do is, first of all, turn it into a model.

  • That's not that hard.

  • So the model is just this.

  • What my neuron's going to do is it's

  • going to do a weighted combination of the inputs.

  • OK?

  • Let's suppose, for example, that this was actually the retina.

  • All right?

  • So this is the input.

  • Each one of these inputs is a pixel.

  • And then each one of them gets multiplied

  • by a weight that corresponds to the strength of the synapse.

  • And if that weighted sum exceeds a threshold,

  • then I get 1 as the output.

  • Otherwise, I get 0.

  • So for example, if this neuron is trying to recognize a cat,

  • if this is a cat, then hopefully what happens

  • is that this weighted sum will be high enough

  • that this will fire and the neuron will say,

  • yes, this is a cat.

  • Now this is all easy.

  • This goes back to the 1950s, and Frank Rosenblatt, and whatnot.

  • The really hard, and interesting, and important

  • question is, how do you train a whole network of these neurons.

  • That's when things actually become very difficult.

  • Because if you have a big network of neurons--

  • so here are my inputs.

  • They go to one set of neurons.

  • Right?

  • These are the functions that they compute.

  • Right?

  • They're in purple here.

  • And then, those go to another layer, and many, many layers,

  • until finally, you get the output.

  • Another question is, well, if the output is correct-- right?

  • So let's say this was a cat and the network said that, yes,

  • it is a cat, then life is good.

  • Nothing needs to change.

  • Right?

  • Why go fixing what ain't broke?

  • But what happens if there was an error?

  • Right?

  • This should have been firing, but wasn't.

  • The key question is, what do I change

  • in that whole big, messy network to try to make it

  • give the right answer tomorrow.

  • There is no obvious answer to this question.

  • Because think of one neuron somewhere

  • in the middle of the network.

  • How is it responsible for the error at the output?

  • All right?

  • The error at the output could have

  • come from an infinitude of different places.

  • This is called the credit assignment problem.

  • Or maybe it should be called the blame assignment

  • problem, because it's deciding who's to blame for an error,

  • and therefore, needs to change.

  • And this is the problem that backpropagation solves.

  • All right?

  • And when people first came up with neural networks

  • in the '60s, they didn't have this.

  • It was when, finally, it was invented

  • in the '80s by David Rumelhart and others

  • that things really took off.

  • And the basic idea in backprop is actually quite intuitive,

  • I think.

  • It's the following.

  • Well, let's think of the difference

  • between my actual output and my desired output.

  • Let me call that delta.

  • Right?

  • This is the error.

  • The output should have been 1, but it was 0.2.

  • So it needs to go up.

  • Another question is, what can I tweak in these weights

  • to make it go up.

  • All right?

  • Well, a weight that is thinking at this last layer,

  • well, at that last layer, the neurons with the highest

  • weights are the ones most responsible for the result.

  • And if this neuron is saying no but the answer

  • is yes, well, then its weight needs

  • to go down, because it's preventing it from firing.

  • And if this one was saying yes but this is not firing,

  • then its weight needs to go up.

  • So what I do is I compute the derivative

  • of this error with respect to the weights in the last layer.

  • And then with those, I now have an error signal

  • coming from these neurons.

  • It's like, oh, you should have been higher.

  • All right?

  • You should have been higher in order to make this guy fire,

  • because you have a positive weight.

  • You should have been lower, because you

  • have a negative weight and you're

  • preventing it from firing.

  • So now I know an error signal at the neurons at this layer.

  • And I can keep doing the same thing

  • all the way back to the input.

  • And this is why it's called backpropagation.

  • Because what I'm doing is I'm propagating back the errors

  • and then updating the weights, changing the weights in order

  • to make that error as small as possible.

  • OK?

  • Well, this is the backpropagation algorithm.

  • It's what's at the heart of deep learning.

  • And these days, this is used for just about everything on Earth.

  • Right?

  • Very early on, people used it to do things

  • like predict the stock market.

  • These days, you use it for search,

  • use it for ad placement, use it for video recognition,

  • use it for speech recognition, use it

  • for things like simultaneous translation and whatnot.

  • But I think, at least for the public,

  • the best known instance of this is still

  • Google's own famous cat network.

  • Unfortunately, people think it's-- it got called "the cat

  • network" by the journalists, right,

  • because it happened to recognize cats very well.

  • Actually, it wasn't an accident, right?

  • This was all learned from YouTube videos,

  • right, as you probably know.

  • And people really like to upload videos of their cats.

  • So there was more data on cats than on anything else.

  • But maybe it should be called "the couch potato

  • network," because it's based on watching a lot of video.

  • And at the time, this was the biggest neural network

  • ever built. It had, I think, on the order

  • of a billion parameters.

  • But of course, these days, there's much bigger ones

  • and they're continuing to grow.

  • So we'll see how far we can take this paradigm.

  • Now the Evolutionaries, they say, well sure,

  • but all you're doing is adjusting the weights

  • on this model.

  • You're just tweaking the strengths of the synapses.

  • Where did that brain come from?

  • That, in some sense, is the bigger question.

  • That brain was produced by evolution.

  • So maybe what we really should do if we want to learn really

  • powerful things is figure out how evolution works.

  • And we actually already have a good sense

  • of how evolution works.

  • There's a lot of details, of course, that

  • are still to be understood.

  • But at a high level, we understand how evolution works.

  • So let us do the same thing on the computer.

  • And the first guy who really ran with this idea

  • was John Holland.

  • He started in the '50s, '60s.

  • And for a long time, this whole area of evolutionary computing,

  • the joke was that it was John, his students,

  • and their students.

  • But then in the '80s, things took off and a lot of people

  • started doing it.

  • John Holland called what he did "genetic algorithms."

  • And then John Koza developed a more powerful version

  • called genetic programming, which we'll see in a second.

  • And Hod Lipson is one of the people

  • who are doing very interesting things

  • with evolutionary learning these days, as we'll see in a bit.

  • So what's the basic idea here?

  • Well, how does evolution work?

  • You have a population of individuals, each of which

  • is described by its genome.

  • OK?

  • But in our case, in the computer,

  • the genome, instead of being base pairs,

  • instead of being DNA, it's just going to be bits, right,

  • because 0 and 1, in some sense, are the DNA of computers.

  • And then each of these individuals

  • gets to go out in the world and be evaluated.

  • It gets evaluated at the task that it's supposed to be doing.

  • And the individuals that do better

  • will have a higher fitness and will therefore

  • have a higher chance of being the parents

  • of the next generation.

  • You get two very fit parents and you cross over their genomes

  • just like people do.

  • And so now you have a child genome

  • that is partly the genome of one parent

  • and partly the genome of the other parent, the mother

  • and father genomes, if you will.

  • And then you also have random mutation, right?

  • Some bits just get randomly mutated because

  • of copying errors or whatever.

  • And then you have a new population.

  • And what's remarkable is that you could actually

  • start out with a population that is essentially random.

  • And after you do some number of generations of this,

  • you actually have things that are doing

  • a lot of non-trivial functions.

  • Like, for example, you can evolve circuits

  • that-- you can develop radio receivers, and amplifiers,

  • and things like that in just this way.

  • And they often work better than the ones

  • that are designed by humans.

  • In fact, people like John Koza, they

  • have a whole bunch of patents that were actually--

  • the patent designs were actually invented

  • by the genetic algorithms, not by the humans.

  • OK?

  • Now Koza's idea was actually to take this one step beyond.

  • And that's the following.

  • Well you know, these strings are a very low level

  • representation.

  • And just cutting a string in the middle to cross over

  • is probably going to muck things up.

  • Right?

  • We are trying to evolve programs, right?

  • At the end of the day, that's all

  • we're trying to do is evolve programs.

  • So why don't we actually work directly

  • with the programs themselves?

  • And a program is really a tree of subroutine calls all the way

  • down to simple operations like additions, multiplications,

  • and ands, and ors.

  • So let's represent the problems as trees.

  • And then in order to cross over two programs, what we're

  • going to do is we're going to randomly pick

  • a node in each of the trees and then we're

  • going to swap the sub-trees at those nodes.

  • And now we have two child programs.

  • OK?

  • So for example, here, if I do the crossover at this point,

  • then one of the sub-trees that I will get

  • is the one that's all in white.

  • And this tree actually represents

  • one of Kepler's laws.

  • This is the law that gives the duration of a planet's year

  • as a function of its average distance from the sun.

  • It's actually a constant times the square root

  • of the cube of the distance.

  • And that's what this is representing.

  • So genetic algorithms can actually

  • discover a thing like this and much more complex ones as well.

  • In fact, these days, what the genetic folks

  • are having a lot of fun with is something

  • that's exciting and scary at the same time.

  • They're not just doing this with software anymore.

  • They're not doing it as a simulation inside the computer,

  • but actually doing it out in the physical, real world

  • with robots.

  • You literally start with robots that are random piles of parts.

  • And then once those robots are good enough,

  • they actually get printed by a 3-D printer.

  • And then they start crawling, and walking,

  • and doing things in the real world-- seeing how fast they

  • can crawl, trying to recover from injury, and so on and so

  • forth.

  • This is actually a real robot from Hod Lipson's lab.

  • OK?

  • And then what happens is that in each generation,

  • the fittest robots get to program

  • the 3-D printer to produce the next generation of robots.

  • So if "Terminator" comes to pass,

  • this might be how we get there.

  • Of course, these robots are not yet

  • ready to take over the world, but they've already

  • come remarkably far from the soup of parts

  • that they began with.

  • Right?

  • And these days, if you see a little spider running around,

  • take a good look.

  • Because it could be one of these instead of one

  • of the biological ones.

  • All right.

  • So the Bayesians-- so here are some famous Bayesians.

  • Judea Pearl, within computer science,

  • is probably the most famous one.

  • He actually won the Turing Award, the Nobel Prize

  • of computer science, a few years ago

  • for inventing something called Bayesian networks, which

  • is one very powerful type of Bayesian model

  • where it's a big graph where each node is a variable.

  • And then the edges between the nodes

  • represent dependencies between the variables.

  • Two other famous Bayesians are David Heckerman

  • and Mike Jordan.

  • Bayesians are known, in machine learning,

  • as the most fanatical of the five tribes.

  • All right?

  • They really have a near-religious devotion

  • to their paradigm, and you know, they're

  • the first ones to say so.

  • And I think the reason for this is that Bayesians had

  • their origins in statistics.

  • And for 200 years, Bayesians were a persecuted minority

  • in statistics.

  • So they had to become very hard-core in order to survive.

  • And it's a good thing that they did survive,

  • because they have a lot to contribute.

  • And these days, with computers, and algorithms

  • like Markov chain Monte Carlo, and large data, and whatnot,

  • they're actually on the ascendant in statistics.

  • So what is the basic idea behind Bayesian learning?

  • Well, the basic idea is that, as I said,

  • everything that you learn is uncertain.

  • So what you have to do is compute the probability

  • of each one of your hypotheses and then update it

  • as new evidence comes in.

  • And the way you do that is with Bayes' theorem.

  • OK?

  • And Bayesians love Bayes' theorem so much

  • that there was this Bayesian machine learning

  • startup that actually had a neon sign of Bayes' theorem made.

  • And then they stuck it outside their office.

  • So there's Bayes' theorem in big, neon letters.

  • OK?

  • So how does Bayes' theorem work?

  • Bayes' theorem is actually incredibly simple.

  • It's so simple it's barely worth being--

  • it would be barely worth calling a theorem

  • if it wasn't so important.

  • So the idea is this.

  • So let's suppose that you have all your hypotheses.

  • You define your space of hypotheses in some way.

  • And it could be, for example, the set of all Bayesian

  • networks, or the set of all neural networks,

  • or all decision trees, or whatever.

  • And now the first thing that you're going to have

  • is the prior probability for each hypothesis.

  • This is how much you believe in that hypothesis

  • before you've even seen any data.

  • And this is actually what makes Bayesian learning

  • very controversial is that many statisticians say, well, you

  • have no grounds on which to just make up a prior.

  • And the Bayesians answer to that is, you have to make that up,

  • whether explicitly or implicitly.

  • So let's just be explicit about it.

  • So the prior is how much you believe in each hypothesis

  • before you see the evidence.

  • But then what happens is that as the evidence comes in,

  • you update the probability of each hypothesis.

  • A hypothesis that is consistent with the data

  • will see its probability go up.

  • A hypothesis that is inconsistent with the data

  • will see its probability go down.

  • OK?

  • And the consistency of the hypothesis of data

  • is measured by what's called the likelihood function, which

  • is the probability of seeing the data if the hypothesis is true.

  • OK?

  • And this theory is actually no different from frequency

  • statistics and the maximum likelihood

  • principle, which is basically saying

  • that if your hypothesis makes what you're seeing likely,

  • then conversely, what you're seeing

  • makes your hypothesis likely.

  • OK?

  • And the Bayesians incorporate that in the likelihood.

  • And the product of the likelihood and the prior

  • is just the posterior, which is how much you

  • believe the hypothesis after you've seen the evidence.

  • So as you see more evidence, the probabilities evolve.

  • And hopefully, at the end of the day,

  • one hypothesis will come out as clearly better than the others.

  • But that won't necessarily be the case.

  • You might still be entertaining a lot of hypotheses

  • even after you've seen a lot data.

  • There's also the marginal, which is

  • just something that you have to divide by to make sure

  • that the probabilities add up to 1, so let's

  • not worry about it for now.

  • And a lot of great things have been

  • done with Bayesian learning.

  • Like, for example, self-driving cars

  • have Bayesian learning in their brains.

  • So in some sense, Bayes' theorem is helping to drive that car

  • or helping the car to learn how to drive.

  • And without it, it would be much harder.

  • But one application of Bayesian learning

  • that everybody is probably familiar with

  • is spam filtering.

  • The first spam filter was actually

  • designed by David Heckerman and his coworkers.

  • And they just used this very simple Bayesian learner called

  • the naive Bayes classifier.

  • And the way it works there is the following.

  • The hypothesis, before I've seen evidence,

  • is that the email is a spam or that the email is not a spam.

  • OK?

  • And the prior probability is like your prior probability

  • of an email being spam-- 90%, 99%, 99.999%-- take your pick.

  • And then the evidence is the actual content of the email.

  • So for example, if the email contains the word "Viagra,"

  • that probably makes it more likely to be spam.

  • If it contains the word "free" in capitals,

  • that makes it even more likely to be spam.

  • And if that "FREE" is followed by four exclamation marks,

  • that makes it even more likely to be spam.

  • OK?

  • On the other hand, if it contains the name of your best

  • friend on the signature line, that actually

  • makes it less likely to be spam.

  • OK?

  • And so what the naive Bayes classifier does

  • is it incorporates that evidence.

  • And at the end of the day, it computes a probability

  • that the email is spam or not spam taking all of that

  • into account.

  • OK?

  • And then based on that probability,

  • you can decide whether to filter it out or actually

  • show it to the user.

  • And we're all grateful that spam filters exist.

  • Otherwise our mailboxes would be unmanageable.

  • These days, all sorts of different learning algorithms

  • get used for spam filtering.

  • But Bayesian learning was the first one

  • and it's still used in many spam filters.

  • OK.

  • Finally, the Analogizers-- so as I mentioned,

  • the basic idea of the Analogizers

  • is that everything that we do, everything that we learn

  • is reasoning by analogy.

  • It's looking at similarities between the new situation

  • that we need to make a decision in

  • and the situations that we're already familiar with.

  • OK?

  • And one of the early pioneers in this area was Peter Hart.

  • He proved some things related to the nearest neighbor algorithm,

  • which we'll see shortly, which is

  • kind of like the first similarity-based algorithm.

  • Vladimir Vapnik is the inventor of support vector machines,

  • a.k.a.

  • kernel machines, which is the most widely used and most

  • successful type of similarity-based learner.

  • But these are both actually still

  • fairly primitive forms of analogical reasoning.

  • There are some much more sophisticated ones

  • that people, like for example, Douglas Hofstadter, work on.

  • And Douglas Hofstadter, of course, is also famous.

  • He's not just a quantitative scientist and computer

  • scientist, he's also famous as the author of "Godel, Escher,

  • Bach," which, ironically, is actually,

  • itself, an extended analogy between Godel's

  • theorem and the music of Bach and the art of Escher.

  • And in fact, his most recent book

  • is 500 pages arguing that all of intelligence is just analogy.

  • So he really does think that analogy

  • is the master algorithm.

  • And in fact, "the terminologizer"

  • was coined by him.

  • So let's see how this works.

  • And I'm going to do it by proposing

  • to you a very small puzzle.

  • And the puzzle is this.

  • Let's suppose I give you a map of two countries, which

  • I fancifully called "Posistan" and "Negaland" because

  • of positive and negative examples.

  • And I don't tell you where the frontier between the two

  • countries is.

  • I just tell you where the major cities are.

  • So the major cities in Posistan are the plus signs,

  • Positiveville is the capital, and similarly

  • for the negative cities.

  • OK?

  • And now my question to you is the following.

  • If I only tell you where the major cities are,

  • can you tell me where the frontier,

  • where the border between the two countries is?

  • Of course, you can't know for sure, right,

  • because the cities don't determine the border.

  • But that's what the machine learning problem is, right?

  • You have to generalize.

  • Now the nearest neighbor algorithm

  • has a very simple answer to this question.

  • It just says, I'm going to assume that a point on the map

  • is in Posistan if it's closer to some positive city

  • then to any negative city.

  • OK?

  • And the effect that this has is to divide the map

  • into the neighborhood of each city.

  • And then Posistan is just the union of the neighborhoods

  • of the positive cities.

  • The neighborhood of a city is the points

  • that are closer to it than to any other city.

  • OK?

  • And then as a result, you get this jagged straight line

  • frontier.

  • And what's remarkable about this is that even though the nearest

  • neighbor algorithm is extremely simple-- in fact,

  • at learning time, it doesn't do anything, right?

  • It's an O of 0 algorithm.

  • It just sits there, doesn't have to do

  • anything-- it can actually form very, very

  • complicated frontiers.

  • In fact, if you give it enough data,

  • it will converge very closely.

  • In fact, it will converge to the best possible hypothesis

  • that you could ever have if you use more than one neighbor.

  • But let's not go into those details.

  • Now there's a couple of things here that are not ideal.

  • One is that this line is probably

  • not quite the right one, right?

  • Because the real frontier is probably smoother.

  • It's not jagged like that.

  • And another one is that if you look at this map closely,

  • you could actually throw away some

  • of the cities, like this one, for example,

  • and it wouldn't change anything.

  • If you threw away this city, its area

  • gets absorbed by the areas of these two other cities.

  • And the frontier doesn't change at all.

  • The only cities that you need to keep

  • are the ones that actually take part

  • in defining the frontier, the so-called "support vectors."

  • In general, these are vectors in hyperspace.

  • And they're called support vectors

  • because they're the ones that keep the frontier where it is.

  • So often, you can throw away the great majority of your examples

  • and that doesn't change anything.

  • And of course, in this example, this doesn't matter.

  • But when you have a data set with millions or billions

  • of examples, that does matter.

  • And support vector machines, or kernel machines, for short,

  • solve both of these problems.

  • So they have a learning procedure

  • that lets them throw away all the examples that are not

  • necessary to define the frontier,

  • so they leave only these.

  • And they also make the frontier smoother.

  • And the way they draw the frontier is by saying, well,

  • let me try to walk from south to north while

  • keeping the positive cities on my left

  • and the negative cities on my right.

  • But I want to always stay as far as possible from them as I can.

  • OK?

  • Think of the cities as mines and think of this as a minefield.

  • Right?

  • If I told you to walk all along a minefield,

  • you would try to give each mine the widest possible berth.

  • You would try to maximize your margin of safety.

  • And that's exactly what support vector machines

  • do is that they maximize the margin between the frontier

  • and the nearest examples.

  • And in the days before deep learning took off,

  • support vector machines were probably

  • the most powerful type of learning

  • that was commonly used.

  • OK?

  • All right, let's look at one application of this.

  • Again, this type of analogy-based learning, people

  • have been doing it since the '50s,

  • so it's used for just about everything on Earth.

  • There is, however, one application

  • that I think everybody has experienced

  • even if they don't know it's an application

  • of analogy-based learning.

  • And that is recommender systems.

  • OK?

  • So for example, let's say that I want

  • to figure out what movies to recommend to you.

  • The idea that folks had almost 20 years

  • ago now was a very simple one.

  • It is, let me not try to do that based

  • on the properties of the movie, because that's hard.

  • Right?

  • You know, people's tastes are very complicated things.

  • What I should do is what is called

  • "collaborative filtering," is find people

  • who have similar tastes to you, meaning they gave five

  • stars to a movie that you gave five stars to,

  • they gave one star to movies that you gave one star to.

  • And now if they give five stars to a movie

  • that you haven't seen, then I'm going

  • to hypothesize that, by analogy, you will also like that movie.

  • And so I will recommend it to you.

  • OK?

  • And this turns out to work spectacularly well.

  • It works so well, in fact, that 3/4 of Netflix's business

  • comes from its recommender system.

  • And Amazon also has a recommender system.

  • About 1/3 of its business comes from its recommender system.

  • And every e-commerce site worth its salt

  • has something like this.

  • And of course, these days, people

  • use all sorts of different learning algorithms for it.

  • But the first one was this nearest neighbor type method.

  • And it's still one of the best.

  • OK?

  • So stepping back, we've met the five tribes

  • of machine learning.

  • We've seen that each one of them has a problem that it can solve

  • better than all the others.

  • And it has a particular master algorithm

  • that solves that problem.

  • So for example, the problem that the Symbolists

  • solve that none of the others know how to solve

  • is the problem of learning knowledge

  • that you can compose in many different ways.

  • And they learn that knowledge with inverse deduction,

  • as we saw.

  • Connectionists solve the credit assignment problem

  • using backprop.

  • Evolutionaries solve the problem of learning structure, right?

  • The Connectionists just start with a fixed structure

  • and just adjust the weights.

  • The Evolutionaries know how to come up

  • with that structure in the first place using

  • genetic programming.

  • The Bayesians are all about uncertainty.

  • They know how to deal with the fact

  • that all the knowledge that you learn is uncertain.

  • They know how to update the probabilities of hypotheses

  • as they see more data.

  • They use that using probabilistic inference, which

  • is essentially computationally efficient ways

  • to apply Bayes' theorem to very large sets of hypotheses.

  • And finally, the Analogizers, they can reason by similarity.

  • They can actually generalize from just one or two examples.

  • All right?

  • Think of Niels Bohr.

  • He came up with the first theory of quantum mechanics

  • by doing an analogy between the atom and the solar system,

  • generalizing from one example.

  • None of the others can do this.

  • And probably the best known analogizer algorithm

  • these days is kernel machines.

  • But the thing that I want to point out

  • is that, precisely because each of these problems

  • is real and important, none of the individual algorithms

  • are enough.

  • What we really need is a single algorithm

  • that solves all five problems at the same time.

  • We need a "grand unified theory" of machine learning

  • in the same sense that the standard model is

  • a grand unified theory of physics or the central dogma

  • is a grand unified theory of biology.

  • And in fact, a bunch of us have been working

  • on this problem for a while.

  • And we've actually made a lot of progress.

  • We're still far from the end of this,

  • but let me just give you a sense of where we are at this point.

  • So if you think for a moment about this problem of,

  • so we have these five algorithms or these five

  • types of learning, how can we unify them all into one,

  • at first, this seems like a very hard problem.

  • And in fact, some people have claimed that it's

  • an impossible problem to solve.

  • It seems very hard, because the algorithms all

  • look very different.

  • But if you look at them closely, actually, they're

  • not that different.

  • They all have the same three parts, representation,

  • evaluation, and optimization.

  • So let's look at what those parts are

  • and then how we can do the unification.

  • So representation is how the learner represents

  • what it's learning, the model or the program that it's learning.

  • Right?

  • It's, in some sense, the programming language

  • in which the learner is going to write the algorithm that it

  • discovered.

  • Typically, it's not going to be Java, or C++,

  • or anything like that, it's going to be something like

  • first-order logic.

  • All right?

  • But it could be differential equations.

  • It could be a linear regression.

  • It could be all sorts of things.

  • So the first thing that we need to do

  • is to unify the representations.

  • And a natural thing to do here is

  • to start with the representations

  • that the Symbolists use, which are variations

  • on first-order logic, and the representation

  • that the Bayesians use, which are generally

  • known as graphical models.

  • Bayesian networks are one type of graphical model.

  • Another type is Markov networks and so on.

  • Each of these is already extremely general.

  • If you can combine the two, you can pretty much

  • represent anything you might want to represent.

  • Any computer program can, for example, already

  • be represented in first-order logic.

  • Any way to deal with uncertainty and weighing evidence

  • can be represented in graphical models.

  • There's bazillions of different models

  • that people have in statistics that all fit

  • into that framework.

  • So if we can combine the two, then we

  • have a very good representation to start with.

  • And indeed, we have done that.

  • In essence, what we've developed is various forms

  • of probabilistic logic.

  • So this is a logic that also incorporates

  • probability and uncertainty.

  • And the most widely used is called Markov logic networks.

  • It's essentially a combination of logic and Markov networks.

  • And it's very simple.

  • It just starts with formulas and first-order logic.

  • Think of a rule in logic, like if this, then that,

  • for example.

  • And then what it does is it gives each rule a weight.

  • So if you really believe the rule,

  • you give it a high weight.

  • If you're not sure, you give it a lower weight.

  • And then the probability of a state of the world

  • goes up with the number and the weight of the rules

  • that are true in that world.

  • OK?

  • So with this, we can represent pretty much anything

  • that we'd like.

  • Now the next part of every learning algorithm

  • is the evaluation.

  • The evaluation is the scoring function that tells me

  • how good a candidate model is.

  • How well does it fit the data?

  • How well does it fit my purposes?

  • In essence, what the learning problem

  • is is to find-- within the space defined by a representation--

  • find the program that maximizes my evaluation function.

  • OK?

  • So what should our evaluation function be?

  • Well, one obvious candidate is just the posterior probability

  • that Bayesians use.

  • And that, again, has a lot of other things already,

  • special cases.

  • But more generally, the evaluation

  • shouldn't really be part of the algorithm.

  • It should be provided by the user.

  • It's for the user to decide what the learner should

  • be optimizing.

  • So if you're a company and your purpose is to maximize profits,

  • then that's what the evaluation function should be.

  • If you're a consumer and your purpose

  • is to maximize your happiness, then

  • that's what should be being maximized

  • is some measure of your happiness.

  • OK?

  • So what the mass problem should be able to do

  • is take anybody's objective function

  • and then learn to optimize that.

  • OK?

  • Finally-- right?

  • I just said the word "optimize"-- the third part

  • of this is optimization, is, how do we actually

  • find the model that maximizes that function.

  • OK?

  • And here, there's a natural combination

  • of ideas from genetic programming

  • and backpropagation, namely, to discover formulas,

  • we can use genetic programming.

  • Each formula in first-order logic is a tree.

  • Right?

  • And now I can cross these trees over

  • and apply the genetic process to come up

  • with better formulas that better describe my domain.

  • And then once I have those formulas, of course,

  • if I'm using Markov logic, I need

  • to come up with weights for those formulas.

  • But of course, this is where backprop comes in.

  • All right?

  • I have my big chain of reasoning involving

  • many different formulas, and facts, and different steps.

  • And all of those have weights.

  • And in order to learn those weights,

  • I can naturally use backpropagation.

  • OK?

  • So we're pretty far along in this.

  • We haven't succeeded yet.

  • But some people think it's only a matter of time before we do.

  • I'm actually a little less gung-ho.

  • I think that even after we've successfully unified these five

  • paradigms, there are still major ideas missing, ideas

  • that we haven't had and without which we will not

  • have a truly universal learner.

  • And in fact, part of my goal in writing the book

  • was to open it up to people who are not already machine

  • learning researchers so they can think about the problem

  • and maybe have ideas that people who are already

  • thinking along the tracks of one of the tribes wouldn't have.

  • So if you figure out how to solve this problem,

  • let me know so I can publish your solution.

  • OK.

  • Let me conclude by talking a little bit about what I think

  • that the master algorithm will enable that we cannot do today.

  • I have four items here.

  • There are, of course, more.

  • But I think these four give a good sense

  • of just how momentous a development this would be.

  • The first one is home robots.

  • We would all love to have robots that cook dinner for us,

  • and do the dishes, and make the beds, and whatnot.

  • But why don't we have them today?

  • Well, first of all, it can't be done without machine learning,

  • right?

  • There's just no way to program a robot

  • to do all the things that it might have to do.

  • But second of all, the learning algorithms that we have today

  • are not good enough.

  • Because a robot, a home robot, in the course of a normal day,

  • will run into all five of those problems multiple times.

  • So it needs to be able to solve all of them.

  • So with a master algorithm, we're on our way.

  • But without it, I think it'll be much harder and much slower

  • if we ever get there.

  • Here's another one.

  • Everybody, including, of course, Google,

  • has a project to try to turn the web into a knowledge base.

  • Right?

  • Instead of issuing keyword queries and getting back

  • pages, what I would like to do is

  • ask questions and get answers.

  • But for that, all the knowledge that's on the web

  • has to be represented in a way that the computer can

  • reason with, something like, for example, first-order logic.

  • On the other hand, the web is full of contradiction,

  • and noise, and gaps, and whatnot,

  • so it's going to be very noisy.

  • So I need probability.

  • OK?

  • So again, I need to be able to combine those five

  • types of learning in order to really

  • be able to extract this knowledge basis from the web.

  • OK?

  • Here's perhaps the most important one, cancer.

  • Why haven't we cured cancer yet?

  • The problem is that cancer is not one disease.

  • Right?

  • Everybody's cancer is different.

  • And in fact, the same cancer mutates as it goes along.

  • So it's very unlikely that there will ever

  • be one drug that cures all cancers.

  • The real cure for cancer-- or at least,

  • that's what an increasing number of cancer researchers believe--

  • is something like a learning program

  • that takes in the genome of the patient, the medical history,

  • the mutations in the tumor cells,

  • and predicts, for that tumor, what

  • is the drug that's going to kill it without harming

  • the patient's normal cells-- or maybe the sequence of drugs,

  • or a combination of drugs, or perhaps even

  • a drug that will be designed from scratch

  • for that particular patient.

  • OK?

  • In some ways, this is not a very different problem

  • from recommender systems that recommend a book or a movie

  • to you.

  • However what they do here is they recommend a drug.

  • The problem, however, is that this problem

  • is orders of magnitude harder than the problem

  • of recommending a drug-- or the problem of recommending

  • a movie or a book.

  • You have to understand how the cell works.

  • You have to understand the interactions

  • between the genes and the proteins that they make.

  • And then they go back and regulate the genes.

  • And it's when this machinery gets out

  • of whack that you get cancer.

  • OK?

  • The good news is we actually have

  • a lot of data to do this, things like microarrays,

  • and sequences, and whatnot.

  • But again, with the learning algorithms that we have today,

  • we're not going be able to do it.

  • With something like the master algorithm,

  • we will be able to do it.

  • OK.

  • Finally, apropos of recommender systems,

  • let me mention this one.

  • What I would really like to have as a consumer

  • is not 500 different recommender systems

  • recommending 500 different things

  • to me-- Netflix recommending movies

  • and Amazon recommending books and Facebook selecting updates

  • and Twitter selecting tweets.

  • What I want is a complete 360-degree model of me.

  • Learn from all the data that I generate.

  • And then that model knows me much better than all

  • these tiny, little models and, as a result,

  • can make can make much better recommendations-- and not

  • just recommendations of small things,

  • but recommendations of jobs, recommendations

  • of houses, recommendations of what to major in, or-- oops,

  • I guess I skipped this slide here.

  • As the foremost city of the federal government said,

  • if we use these things, we can actually

  • have a recommender system that is, in essence, your best

  • friend throughout your life, recommending the things

  • that you need at every step.

  • OK?

  • And again, in order to do that, we need not just the data,

  • which, increasingly, we have, we need the algorithms

  • that are powerful enough to learn that rich model of you.

  • OK?

  • More about all these things in the book,

  • "The Master Algorithm."

  • And thank you for listening and I'll take questions.

  • [APPLAUSE]

  • AUDIENCE: The representation that you proposed,

  • the Markov logic network, covered the natural language.

  • Like, is there a one-to-one mapping between

  • the representation and the--

  • PEDRO DOMINGOS: Yeah.

  • In fact, one of the biggest areas

  • where Markov logic has been applied is natural language.

  • It's a very good match for natural language,

  • because natural language is both very compositional-- right,

  • so you need the logic for that-- and also very ambiguous,

  • very uncertain, very noisy.

  • So you need a probability for that.

  • So people, at this point, have applied Markov logic networks

  • to pretty much every major problem in natural language,

  • and won competitions using it, and so forth.

  • So in some ways, natural language

  • is one of the killer apps for Markov logic networks.

  • AUDIENCE: Is there a way for us to learn

  • to use it in the genetic programming algorithms?

  • PEDRO DOMINGOS: So the question is, are we

  • able to learn Markov logic networks using

  • genetic programming.

  • People haven't done that yet.

  • So we've learned-- there's a wide range of algorithms

  • for learning the structure of Markov logic networks.

  • They are similar to genetic programming,

  • but without the crossover.

  • All right?

  • The Evolutionaries really believe in crossover.

  • Everybody else in machine learning

  • thinks something like hill climbing, greedy search,

  • or beam search is probably enough.

  • And there are many different search

  • methods that have been used and that work pretty well.

  • It's actually a good question.

  • That part has not been done yet, whether actually doing

  • the crossover between the formulas will help.

  • AUDIENCE: Related question to genetic programming--

  • how is the solution space limited?

  • Because it seems that combining various sub-trees

  • would lead to a rapidly fast, exponentially growing solution

  • space.

  • And has there been any success with NP-hard problems?

  • PEDRO DOMINGOS: So something that people have observed

  • in genetic programming is that the trees tend to get

  • bigger and bigger over time.

  • People jokingly call it the "survival of the fattest."

  • What you can do to combat that is

  • to actually have a bias, a penalty

  • on the size of the trees.

  • So if the trees are big, they're less fit just because of that.

  • You put that into your objective function.

  • AUDIENCE: NP-hard problems-- any success

  • in that area with genetic algorithms or programming?

  • PEDRO DOMINGOS: Well, the short answer to that is yes

  • and the longer answer is yes, but.

  • A lot of the problems that people

  • are approaching with genetic algorithms

  • are NP hard problems.

  • The question that the people who aren't Evolutionaries ask

  • is, did you really solve-- so first of all,

  • they're NP-hard problems, right?

  • So you can't solve the worst, the hardest instances

  • efficiently.

  • But there are many instances that aren't the hardest.

  • And so, can you solve them well enough?

  • And there are examples of genetic algorithms solving

  • these problems.

  • But there are also counterexamples

  • of people saying, look, I could have

  • done that with hill climbing and it

  • would have been just as fast or just as good.

  • So the jury's still out on this.

  • AUDIENCE: So with your 360-degree recommender systems,

  • what's to keep that from being self-fulfilling or guiding

  • a person into some particular path that's not, perhaps,

  • [INAUDIBLE]?

  • PEDRO DOMINGOS: What's going to keep that

  • from being self-fulfilling is the way machine learning works,

  • which is the system recommends something to you.

  • Or for example, it recommends 10 alternatives to you.

  • All right?

  • Or let's say it recommends one, but then you say, no,

  • this was the wrong one.

  • Then it learns from that.

  • So the recommender system is not something

  • that you learn one day offline and then you use it.

  • It's something that is continuously

  • learning from what you do.

  • So if it starts doing the wrong thing, you start being unhappy

  • and you start displaying that unhappiness

  • in one form or another.

  • And then it learns, from that, to try something else.

  • And also remember, these systems, they

  • can talk with each other to the extent

  • that you decide that they can.

  • So it won't just be learning from you,

  • it will be learning from a lot of people.

  • All right?

  • So it's always the question of, how far do

  • you have to generalize, right?

  • The more data that you have, the easier the problem becomes.

  • And the more you have a continuous loop of feedback,

  • the more robust the learning is.

  • AUDIENCE: The 360 recommender, how do you

  • square it with the need for privacy for people?

  • PEDRO DOMINGOS: Yeah, so that's a very important question

  • which I didn't allude to.

  • I want to have a 360-degree model of me,

  • but I want it to be under my control.

  • Because if somebody else has a model of me

  • that knows me better than my best friend,

  • they have too much power.

  • So I think what needs to happen is that-- right now, what

  • happens is that the data that you generate

  • is spread all over the place.

  • It would be good to bring it all to one place,

  • but that place has to be under your control.

  • And I think one way that this could work

  • is that you could have a company that is to a data

  • like a bank is to your money.

  • Right?

  • You put your money in the bank, and then the bank

  • invests it for you, and so on and so forth.

  • But ultimately, the money is under your control.

  • So I think the same thing should happen with your data.

  • So this company aggregates the data.

  • It learns the model.

  • It uses the model on your behalf.

  • But at the end of the day, it's for you

  • to decide whether, maybe, for example, you

  • want to take the data somewhere else

  • or you want to do something else with it.

  • I think if you don't do that, people won't trust

  • this enough for it to happen.

  • AUDIENCE: What are your thoughts on TensorFlow?

  • PEDRO DOMINGOS: I haven't played with TensorFlow myself.

  • I think, first of all, it's great

  • that TensorFlow has been released.

  • I think releasing open source software

  • like this is a large part of how we make progress.

  • And definitely, deep learning and doing it on a large scale

  • are very important.

  • There are a number of other alternatives out there.

  • We'll see how TensorFlow compares with them.

  • And we'll also see-- like just from my own point of view--

  • I have some of my students working on deep learning--

  • the question with each one of these systems

  • is, what does it support well versus not.

  • I think if the learning that you're trying to do

  • fits within the paradigm of TensorFlow,

  • then it's probably a good thing to use.

  • But if it doesn't, then you may need something else.

  • Or maybe what's going to happen is that there will

  • be an extension of TensorFlow to do these things in the future.

  • AUDIENCE: One of its goals, actually,

  • is to become a universal system for expressing AI solutions.

  • PEDRO DOMINGOS: Yeah, and I very much sympathize with that goal.

  • I think that in order to be a universal system,

  • you have to cover these five paradigms.

  • But one way to cover them is to start from one--

  • let's say deep learning-- and then try

  • to absorb another, and another, and another.

  • And there are people doing this kind of thing.

  • Like for example, we have the Alchemy System,

  • which combines the Symbolist and the Bayesian learning.

  • We've also developed, for example,

  • combinations of symbolic learning

  • and instance-based learning and this with neural networks.

  • So what I hope to see from a platform like TensorFlow

  • is see it absorbing more and more of these capabilities.

  • And absorbing them doesn't necessarily

  • mean going and giving us some primitives

  • to do what that school does.

  • That's OK, but it also increases the complexity.

  • The ideal thing would be you still

  • have something that is simple, but yet with that simple thing,

  • you can do all these combinations of things.

  • And I do think that's possible.

  • And I'm curious to see how this is all

  • going to evolve in the next several years.

  • MALE SPEAKER: And with that, please

  • join me in thanking Pedro for this talk.

  • PEDRO DOMINGOS: Thank you.

  • [APPLAUSE]

MALE SPEAKER: Welcome, everybody,

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it