Placeholder Image

Subtitles section Play video

  • [MUSIC]

  • Stanford University.

  • >> Okay everyone.

  • We're ready.

  • Okay well welcome to CS224N in Linguistics 284.

  • This is kind of amazing.

  • Thank you for everyone who's here that's involved and also the people who don't fit

  • in here and the people who are seeing it online on SCPD.

  • Yeah it's totally amazing the number of people who've signed up to do this class

  • and so in some sense it seems like you don't need any advertisements for

  • why the combination of natural language process and

  • deep learning is a good thing to learn about.

  • But nonetheless today, this class is really going

  • to give some of that advertisement, so I'm Christopher Manning.

  • So what we're gonna do is I'm gonna start off by saying

  • a bit of stuff about what natural language processing is and what deep learning is,

  • and then after that we'll spend a few minutes on the course logistics.

  • And a word from my co-instructor, Richard.

  • And then, get through some more material on why is language

  • understanding difficult, and then starting to do an intro to deep learning for NLP.

  • So we've gotten off to a rocky start today,

  • cause I guess we started about ten minutes late because of that fire alarm going off.

  • Fortunately, there's actually not a lot of hard content in this first lecture.

  • This first lecture is really to explain what an NLP class is and say

  • some motivational content about how and why deep learning is changing the world.

  • That's going to change immediately on the Thursday lecture because for

  • the Thursday lecture is then we're gonna start with sort of vectors and

  • derivatives and chain rules and all of that stuff.

  • So you should get mentally prepared for

  • that change of level between the two lectures.

  • Okay, so first of all what is natural language processing?

  • So natural language processing, that's the sort of computer scientist's name for

  • the field.

  • Essentially synonymous with computational linguistics which is

  • sort of the linguist's name of the field.

  • And so it's in this intersection of computer science and linguistics and

  • artificial intelligence.

  • Where what we're trying to do is get computers to

  • do clever things with human languages to be able to understand and

  • express themselves in human languages the way that human beings do.

  • So natural language processing counts as a part of artificial intelligence.

  • And there are obviously other important parts of artificial intelligence,

  • of doing computer vision, and robotics,

  • and knowledge representation, reasoning and so on.

  • But language has had a very special part of artificial intelligence,

  • and that's because that language has been this very distinctive properties of

  • human beings, and we think and go about the world largely in terms of language.

  • So lots of creatures around the planet have pretty good vision systems,

  • but human beings are alone for language.

  • And when we think about how we express our ideas and go about doing things that

  • language is largely our tool for thinking and our tool for communication.

  • So it's been one of the key technologies that people have thought

  • about in artificial intelligence and it's the one that we're going to look at today.

  • So our goal is how can we get computers to process or

  • understand human languages in order to perform tasks that are useful.

  • So that could be things like making appointments, or buying things, or

  • it could be more highfalutin goals of sort of, understanding the state of the world.

  • And so this is a space in which there's starting to be a huge amount of commercial

  • activity in various directions, some of things like making appointments.

  • A lot of it in the direction of question answering.

  • So, luckily for people who do language, the arrival of mobile has just been super,

  • super friendly in terms of the importance of language has gone way way higher.

  • And so now really all of the huge tech firms whether it's Siri,

  • Google Assistant, Facebook and Cortana.

  • But what they're furiously doing is

  • putting out products that use natural language to communicate with users.

  • And that's an extremely compelling thing to do.

  • It's extremely compelling on phones because phones have these dinky

  • little keyboards that are really hard to type things on.

  • And a lot of you guys are very fast at texting, I know that, but

  • really a lot of those problems are much worse for a lot of other people.

  • So it's a lot harder to put in Chinese characters than it is to put in

  • English letters.

  • It's a lot harder if you're elderly.

  • It's a lot harder if you've got low levels of literacy.

  • But then there are also being new vistas opening up.

  • So Amazon has had this amazing success with Alexa, which is really shown

  • the utility of having devices that are just ambient in the environment, and

  • that again you can communicate with by talking to them.

  • As a quick shout-out for Apple, I mean, really,

  • we do have Apple to thank for launching Siri.

  • It was, essentially, Apple taking the bet on saying we can

  • turn human language into consumer technology that

  • really did set off this arms race every other company is now engaging on.

  • Okay, I just sort of loosely said meaning.

  • One of the things that we'll talk about more is meaning is a kind of a complex,

  • hard thing and it's hard to know what it means to understand fully meaning.

  • At any rate that's certainly a very tough goal which people refer to as AI-complete

  • and it involves all forms of our understanding of the world.

  • So a lot of the time when we say understand the meaning,

  • we might be happy if we sort of half understood the meaning.

  • And we'll talk about different ways that we can hope to do that.

  • Okay, so one of the other things that we hope that you'll get in

  • this class is sort of a bit of appreciation for human language and

  • what it's levels are and how it's processed.

  • Now obviously we're not gonna do a huge amount of that if you really wanna

  • learn a lot about that.

  • There are lots of classes that you can take in the linguistics department and

  • learn much more about it.

  • But I really hope you can at least sort of get a bit of a high level of

  • understanding.

  • So this is kind of the picture that people traditionally have given for

  • levels of language.

  • So at the beginning there's input.

  • So input would commonly be speech.

  • And then you're doing phonetic and

  • phonological analysis to understand that speech.

  • Though commonly it is also text.

  • And then there's some processing that's done there which has

  • sort of been a bit marginal from a linguistics point of view, OCR,

  • working out the tokenization of the words.

  • But then what we do is go through a series of processing steps

  • where we work out complex words like incomprehensible,

  • it has the in in front and the ible at the end.

  • And that sort of morphological analysis, the parts of words.

  • And then we try and

  • understand the structure of sentences, that syntactic analysis.

  • So if I have a sentence like 'I sat on the bench',

  • that 'I' is the subject of the verb 'sat', and the 'on the bench' is the location.

  • Then after that we attempt to do semantic understanding.

  • And that's semantic interpretation's working out the meaning of sentences.

  • But simply knowing the meaning of the words of a sentence isn't

  • sufficient to actually really understand human language.

  • A lot is conveyed by the context in which language is used.

  • And so that then leads into areas like pragmatics and discourse processing.

  • So in this class, where we're gonna spend most of our time is in that middle

  • piece of syntactic analysis and semantic interpretation.

  • And that's sort of bulk of our natural language processing class.

  • We will say a little bit right at the top left where this discussion,

  • speech signal analysis.

  • And interestingly, that was actually the first place where deep learning

  • really proved itself as super, super useful for tasks involving human language.

  • Okay, so applications of Natural Language Processing are now

  • really spreading out thick and fast.

  • And every day you're variously using applications of

  • Natural Language Processing.

  • And they vary on a spectrum.

  • So they vary from very simple ones to much more complex ones.

  • So at the low level, there are things like spell checkings, or

  • doing the kind of autocomplete on your phone.

  • So that's a sort of a primitive language understanding task.

  • Variously, when you're doing web searches,

  • your search engine is considering synonyms, and things like that for you.

  • And, well, that's also a language understanding task.

  • But what we are gonna be more interested in is trying to

  • push our language understanding computers up to more complex tasks.

  • So some of the next level up kind of tasks that we're actually gonna want to have

  • computers look at text information, be it websites, newspapers or whatever.

  • And get the information out of it, to actually understand the text well enough

  • that they know what it's talking about to at least some extent.

  • And so that could be things like expecting particular kinds of information, like

  • products and their prices or people and what jobs they have and things like that.

  • Or it could be doing other related tasks to understanding the document,

  • such as working out the reading level or intended audience of the document.

  • Or whether this tweet is saying something positive or

  • negative about this person, company, band or whatever.

  • And then going even a higher level than that, what we'd like our computers

  • to be able to do is complete whole level language understanding tasks.

  • And some of the prominent tasks of that kind that we're going to talk about.

  • Machine translation, going from one human language to another human language.

  • Building spoken dialogue systems, so you can chat to a computer and

  • have a natural conversation, just as you do with human beings.

  • Or having computers that can actually exploit the knowledge of the world

  • that available on things like Wikipedia and other sources.

  • And so it could actually just intelligently answer questions for

  • you, like a know everything human being could.

  • Okay, and we're starting to see a lot of those things actually being used

  • regularly in industry.

  • So every time you're doing a search, in little places, there are bits of

  • natural language processing and natural language understanding happening.

  • So if you're putting in forms of words with endings,

  • your search engine's considering taking them off.

  • If there are spelling errors, they're being corrected.

  • Synonyms are being considered, and things like that.

  • Similarly, when you're being matched for advertisements.

  • But what's really exciting is that we're now starting to see much

  • bigger applications of natural language processing being commercially successful.

  • So in the last few years, there's just been amazing,

  • amazing advances in machine translation that I'll come back to later.

  • There have been amazing advances in speech recognition so that we just now

  • get hugely good performance in speech recognition even on our cell phones.

  • Products like sentiment analysis they have become hugely commercially

  • important, right?

  • It depends on your favorite industries but there are lots of Wall Street Journal

  • firms that every hour of the day are scanning news articles looking for

  • sentiment about companies to make buy and sell decisions.

  • And just recently, really over the last 12 months,

  • there's been this huge growth of interest in how to build chatbots and

  • dialog agents for all sorts of interface tasks.

  • And that sort of seems like it's growing to become a huge new industry.

  • Okay, see I'm getting behind already.

  • So in just a couple of minutes,

  • I want to say that corresponding things about deep learning.

  • But before getting into that,

  • let me just say a minute about what's special about human language.

  • Maybe we'll come back to this, but

  • I think it's interesting to have a sense of right at the beginning.

  • So there's an important difference between language and

  • most other kinds of things that people think of when they do signal processing

  • and data mining and all of those kinds of things.

  • So for most things, there's just sort of data that's either the world out there.

  • It has some kind of, pick up some visual system for it.

  • Or someone's sort of buying products at the local Safeway.

  • And then someone else is picking up the sales log and saying,

  • let me analyze this and see what I can find, right?

  • So it's just sort of all this random data and

  • then then someone's trying to make sense of it.

  • So fundamentally, human language isn't like that.

  • Human language isn't just sort of a massive data exhaust that you're trying to

  • process into something useful.

  • Human language, almost all of it is that there's some

  • human being who actually had some information they wanted to communicate.

  • And they constructed a message to communicate that

  • information to other human beings.

  • So it's actually a deliberate form of sending a particular

  • message to other people.

  • Okay, and an amazing fact about human language is it's this very complex

  • system that somehow two,

  • three, four year old kids amazingly can start to pick it up and use it.

  • So there's something good going on there.

  • Another interesting property of language is that language is actually

  • what you could variously call a discrete, symbolic, or categorical signaling system.

  • So we have words for concepts like rocket or violin.

  • And basically, we're communicating with other people via symbols.

  • There are some tiny exceptions for expressive signaling, so

  • you can distinguish saying, I love it versus I LOVE it.

  • And that sounds stronger.

  • But 99% of the time it's using these symbols to communicate meaning.

  • And presumably, that came about in a sort of EE information theory sense.

  • Because by having symbols,

  • they're very reliable units that can be signaled reliably over a distance.

  • And so that's an important thing to be aware of, right?

  • Language is symbols.

  • So if symbols aren't just some invention of logic or classical AI.

  • But then, when we move beyond that,

  • there's actually something interesting going on.

  • So when human beings communicate with language

  • that although what they're wanting to communicate involves symbols.

  • That the way they communicate those symbols is using a continuous substrate.

  • And a really interesting thing about language is you

  • can convey exactly the same message by using different continuous substrates.

  • So commonly, we use voice and so there are audio waves.

  • You can put stuff on a piece of paper and then you have a vision problem.

  • You can also use sign language to communicate.

  • And that's a different kind of continuous substrate.

  • So all of those can be used.

  • But there's sort of a symbol underlying all of those different encodings.

  • Okay, so what the picture we have is that the communication medium is continuous.

  • Human languages are a symbol system.

  • And then the interesting part is what happens after that.

  • So the dominant idea in most of the history of philosophy and

  • science and artificial intelligence was to sort of project

  • the symbol system of language into our brains.

  • And think of brains as symbolic processors.

  • But that doesn't actually seem to have any basis in what brains are like.

  • Everything that we know about brains is that they're completely

  • continuous systems as well.

  • And so the interesting idea that's been emerging out of this work in deep

  • learning is to say, no, what we should be doing is also thinking of our

  • brains as having continuous patterns of activation.

  • And so then the picture we have is that we're going from continuous to symbolic,

  • back to continuous every time that we use language.

  • So that's interesting.

  • It also points out one of the problems of doing language understanding that we'll

  • come back to a lot of times.

  • So in languages we have huge vocabularies.

  • So languages have tens of thousands of words minimum.

  • And really, languages like English with a huge scientific vocabulary,

  • have hundreds of thousands of words in them.

  • It depends how you count.

  • If you start counting up all of the morphological forms, you can argue some

  • languages have an infinite number of words cuz they have productive morphology.

  • But however you count, it means we've got this huge problem of sparsity and

  • that's one of the big problems that we're gonna have to deal with.

  • Okay, now I'll change gears and say a little bit of an intro to deep learning.

  • So deep learning has been this area that has erupted over the sort of this decade.

  • And I mean, it's just been enormously,

  • enormously exciting how deep learning has succeeded and how it has expanded.

  • So really, at the moment it seems like every month you see in the tech news

  • that there's just amazing new improvements that are coming out from deep learning.

  • So one month it's super human computer vision systems,

  • the next month it's machine translation that's vastly improved.

  • The month after that people are working out how to get computers to

  • produce their own artistry that's incredibly realistic.

  • Then the month after that,

  • people are producing new text-to-speech systems that sound amazingly lifelike.

  • I mean, there's just been this sort of huge dynamic of progress.

  • So what is underlying all of that?

  • So, well, as a starting point, deep learning, it's part of machine learning.

  • So in general, it's this idea of how can we get computers to learn stuff

  • automatically, rather than just us having to tell them things and coding by hand

  • in the kind of traditional write computer program to tell it what you want it to do.

  • But deep learning is also profoundly different to the vast majority of

  • what happened in machine learning in the 80s, 90s, and 00s.

  • And this central difference is that for most of traditional machine learning,

  • if I call it that.

  • So this is all of the stuff like decision trees, logistic regressions,

  • naive bayes, support vector machines, and any of those sort of things.

  • Essentially the way that we did things was,

  • what we did was have a human being who looked carefully at a particular

  • problem and worked out what was important in that problem.

  • And then designed features that would be useful features for

  • handling the problem that they would then encode by hand.

  • Normally by writing little bits of Python code or

  • something like that to recognize those features.

  • They're probably a little bit small to read, but over on the right-hand side,

  • these are showing some features for an entity recognition system.

  • Finding person names, company names, and so on in text.

  • And this is just the kind of system I've written myself.

  • So, well, if you want to know whether a word is a company, you'd wanna look

  • whether it was capitalized, so you have a feature like that.

  • It turns out that looking at the words to the left and

  • right would be useful to have features for that.

  • It turns out that looking at substrings of words is

  • useful cause they're kind of common patterns of

  • letter sequences that indicate names of people versus of names of companies.

  • So you put in features for substrings.

  • If you see hyphens and things, that's an indicator of some things.

  • You put in a feature for that.

  • So you keep on putting in features and commonly these kind of systems would end

  • up with millions of hand-designed features.

  • And that was essentially how Google search was done until about 2015 as well, right?

  • They liked the word signal rather than feature.

  • But the way you improved Google search was every month some

  • bunch of engineers came up with some new signal.

  • That they could show with an experiment that if you added in these extra features,

  • Google search got a bit better.

  • And [INAUDIBLE] a degree and that would get thrown in, and

  • things would get a bit better.

  • But the thing to think about is, well, this was advertised as machine learning,

  • but what was the machine actually learning?

  • It turns out that the machine was learning almost nothing.

  • So the human being was learning a lot about the problem, right?

  • They were looking at the problem hard, doing lots of data analysis, developing

  • theories, and learning a lot about what was important for this property.

  • What was the machine doing?

  • It turns out that the only thing the machine was doing

  • was numeric optimization.

  • So once you had all these signals,

  • what you're then going to be doing was building a linear classifier.

  • Which meant that you were putting a parameter weight in front of each feature.

  • And the machine learning system's job was to adjust those numbers so

  • as to optimize performance.

  • And that's actually something that computers are really good at.

  • Computers are really good at doing numeric optimization and

  • it's something that human beings are actually less good at.

  • Cuz humans, if you say, here are 100 features,

  • put a real number in front of each one to maximize performance.

  • Well, they've got sort of a vague idea but

  • they certainly can't do that as well as a computer can.

  • So that was useful but is doing numeric optimization,

  • is that what machine learning means?

  • It doesn't seem like it should be.

  • Okay, so what we found that in practice machine learning was sort of 90%

  • human beings working out how to describe data and work out important features.

  • And only sort of 10% the computer running this learning

  • numerical optimization algorithm.

  • Okay, so how does that differ with deep learning?

  • So deep learning works,

  • is part of this field that's called representation learning.

  • And the idea of representation learning is to say, we can just feed to our computers

  • raw signals from the world, whether that's visual signals or language signals.

  • And then the computer can automatically, by itself, come up with

  • good intermediate representations that will allow it to do tasks well.

  • So in some sense, it's gonna be inventing its own features

  • in the same way that in the past the human being was inventing the features.

  • So precisely deep learning,

  • the real meaning of the word deep learning is the argument that you could

  • actually have multiple layers of learned representations.

  • And that you'd be able to outperform other methods of learning

  • by having multiple layers of learned representations.

  • That was where the term deep learning came from.

  • Nowadays, half the time, deep learning just means you're using neural networks.

  • And the other half of the time it means there's some tech reporter writing a story

  • and it's vaguely got to do with intelligent computers and

  • all other bets are off.

  • Okay, [LAUGH] yeah.

  • So with the kind of coincidence where sort of deep learning

  • really means neural networks a lot of the time, we're gonna be part of that.

  • So what we're gonna focus on in this class is different kinds of neural networks.

  • So at the moment, they're clearly the dominant family

  • of ways in which people have reached success in doing deep learning.

  • But it's not the only possible way that you could do it that people have

  • certainly looked at trying to use various other kinds of probabilistic models and

  • other things in deep architectures.

  • And I think that may well be more of that work in the future.

  • What are these neural networks that we are talking about?

  • That's something we'll come back to and talk a lot about both on Thursday and

  • next week.

  • I mean you noticed a lot of these neural terminology.

  • I mean in some sense if you're kind of coming from a background of statistics or

  • something like that, you could sort of say neural networks,

  • they're kind of nothing really more than stack logistic regressions or

  • perhaps more generally kinda stacked generalized linear models.

  • And in some sense that's true.

  • There are some connections to neuroscience in some cases,

  • so that's not a big focus on this class at all.

  • But on the other hand, there's something very qualitatively different,

  • that by the kind of architectures that people are building now for

  • these complex stacking of neural unit architectures,

  • you end up with a behavior and a way of thinking and a way of doing things that's

  • just hugely different, than anything that was coming before in earlier statistics.

  • We're not really gonna take a historical approach,

  • we're gonna concentrate on methods that work well right now.

  • If you'd like to read a long history of deep learning,

  • though I'll warn you it's a pretty dry and boring history,

  • there's this very long arxiv paper byrgen Schmidhuber that you could look at.

  • Okay, so why is deep learning exciting?

  • So in general our manually designed features tend to be overspecified,

  • incomplete, take a long time to design and validate, and

  • only get you to a certain level of performance at the end of the day.

  • Where the learned features are easy to adapt, fast to train, and

  • they can keep on learning so that they get to a better level of

  • performance than we've been able to achieve previously.

  • So, deep learning ends up providing this sort of very flexible, almost universal

  • learning framework which is just great for representing all kinds of information.

  • Linguistic information but also world information or visual information.

  • It can be used in both supervised fashions and unsupervised fashions.

  • The real reason why deep learning is exciting to most people

  • is it has been working.

  • So starting from approximately 2010, there were initial successes where

  • deep learning were shown to work far better than any of the traditional machine

  • learning methods that have been used for the last 30 years.

  • But going even beyond that,

  • what has just been totally stunning is over the last six or seven years,

  • there's just been this amazing ramp in which deep learning methods have been

  • keeping on being improved and getting better at just an amazing speed.

  • Which is actually sort of being, maybe I'm biased, but

  • in the length of my lifetime, I'd actually just say it's unprecedented,

  • in terms of seeing a field that has been progressing quite so quickly in its

  • ability to be sort of rolling out better methods of doing things, month on month.

  • And that's why you're sort of seeing all of this huge industry excitement,

  • new products, and you're all here today.

  • So why has deep learning succeeded so brilliantly?

  • And I mean this is actually a slightly more subtle and

  • in some sense not quite so uplifting a tale.

  • Because when you look at a lot of the key techniques that we use for

  • deep learning were actually invented in the 80s or 90s.

  • They're not new.

  • We're using a lot of stuff that was done in the 80s and 90s.

  • And somehow, they didn't really take off then.

  • So what is the difference?

  • Well it turns out that actually some of the difference,

  • actually maybe quite a lot of the difference, is just that

  • technological advances have happened that make this all possible.

  • So we now have vastly greater amounts of data available because of our

  • online society where just about everything is available as data.

  • And having vast amounts of data really favors deep learning models.

  • In the 80s and 90s,

  • there sort of wasn't really enough compute power to do deep learning well.

  • So having sort of several more decades of compute power

  • has just made it that we can now build systems that work.

  • I mean in particular there's been this amazing confluence

  • that deep learning has proven to be just super well suited to the kind of parallel

  • vector processing that's available now for very little money in GPUs.

  • So there's been this sort of marriage between deep learning and

  • GPUs, which has enabled a lot of stuff to have happened.

  • So that's actually quite a lot of what's going on.

  • But it's not the only thing that's going on and it's not the thing that's leading

  • to this sort of things keeping on getting better and better month by month.

  • I mean, people have also come up with

  • better ways of learning intermediate representations.

  • They've come up with much better ways of doing end-to-end joint system learning.

  • They've come up with much better ways of

  • transferring information between domains and between contexts and things.

  • So there are also a lot of new algorithms and algorithmic advances and they're sort

  • of in some sense the more exciting stuff that we're gonna focus on for

  • more of the time.

  • Okay, so

  • really the first big breakthrough in deep learning was in speech recognition.

  • It wasn't as widely heralded as the second big breakthrough in deep learning.

  • But this was really the big one that started.

  • At the University of Toronto, George Dahl working with Geoff Hinton

  • started showing on tiny datasets, that

  • they could do exciting things with deep neural networks for speech recognition.

  • So George Dahl then went off to Microsoft and then fairly shortly after that,

  • another student from Toronto went to Google and they started

  • building big speech recognition systems that use deep learning networks.

  • And speech recognition's a problem that's been worked on for

  • decades by hundreds of people.

  • And there are big companies.

  • And there was this sort of fairly standardized technology of

  • using Gaussian mixture models for the acoustic analysis and

  • hidden Markov models and blah blah blah.

  • Which people have been honing for decades trying to improve a few percent a year.

  • And what they were able to show was by changing from that

  • to using deep learning models for doing speech recognition, that they

  • were immediately able to get just these enormous decreases in word error rate.

  • About a 30% decrease in word error rate.

  • Then the second huge example of the success of deep learning,

  • which ended up being a much bigger thing in terms of everybody noticing it,

  • was in the ImageNet computer vision competition.

  • So in 2012 again students of Geoff Hinton at Toronto set about building a computer

  • vision system of doing ImageNet task of classifying objects into categories.

  • And that was again a task that had been run for several years.

  • And performance seemed fairly stalled with traditional computer vision methods and

  • running deep neural networks on GPUs that they were able to get an over

  • one-third error reduction in one fell swoop.

  • And that progress is continued through the years, but

  • we won't say a lot on that here.

  • Okay, that's taken me a fair way.

  • So let's stop for a moment and do the logistics, and

  • I'll say more about deep learning and NLP.

  • Okay, so this class is gonna have two instructors.

  • I'm Chris Manning and I'm a Stanford faculty, then the other one is Richard,

  • who's the chief scientist of faith of Salesforce, and so

  • I'll let him say a minute or two hello.

  • >> Hi there, great to be here.

  • I guess, just a brief little bit about myself.

  • In 2014, I graduated, I got my PhD here with Chris and

  • Enring in deep learning for NLP.

  • And then almost became a professor, but then started a little company,

  • built an ad platform, did some research.

  • And then earlier last year,

  • we got acquired by Salesforce, which is how I ended up there.

  • I've been teaching CS224D the last two years and

  • super excited to merge to two classes.

  • >> Okay.

  • >> I think next week, I'll do the two lectures, so you'll see a lot of me.

  • >> [LAUGH] >> I'll do all the boring equations.

  • >> [LAUGH] Okay, and then TAs, we've got many really wonderful,

  • competent, great TAs for this class.

  • Yeah, so normally I go through all the TAs, but there are sort of so

  • many, both of them and you, that maybe I won't go through them all, but

  • maybe they could all just sort of stand up for a minute if you're a TA in the class.

  • They're all in that corner, okay, [LAUGH] and they're clustered.

  • [LAUGH] Okay, right, yeah, so at this point,

  • I mean, apologies about the room capacity.

  • So the fact of the matter is if this class is being kind of videoed and broadcast,

  • this is sort of the largest SCPD classroom that they record in.

  • So, there's no real choice for this,

  • this is the same reason that this is where 221 is, and this is where 229 is.

  • But it's a shame that there aren't enough seats for everybody, sorry about that.

  • It will be available shortly after each class, also as a video.

  • In general for the other information, look at the website, but there's a couple

  • things that I do just wanna say a little bit about, prerequisites and work to do.

  • So, when it comes down to it,

  • these are the things that you sort of really need to know.

  • And we'll expect you to know, and if you don't know, you should start working

  • out what you don't know and what to do about it very quickly.

  • So the first one is we're gonna do the assignments in Python, so

  • proficiency in Python, there's a tutorial on the website,

  • not hard to learn if you do something else.

  • Essentially, Python has just become the lingua franca of nearly all the deep

  • learning toolkits, so that seems the thing to use.

  • We're gonna do a lot of stuff with calculus and vectors and

  • matrices, so multivariate calculus, linear algebra.

  • It'll start turning up on Thursday and even more next week.

  • Sort of basic probability and statistics, you don't need to know anything

  • fancy about martingales or something, I don't either.

  • But you should know the elements of that stuff.

  • And then we're gonna assume you know some fundamentals of machine learning.

  • So if you've done 221 or 229, that's fine.

  • Again, you don't need to know all of that content, but

  • we sort of assume that you've seen loss functions, and you have some idea about

  • how you do optimization with gradient descent and things like that.

  • Okay, so in terms of what we hope to teach, the first thing is an understanding

  • of and ability to use effective modern methods for deep learning.

  • So we'll be covering all the basics, but

  • especially an emphasis on the main methods that are being used in NLP,

  • which is things like recurrent networks, attention, and things like that.

  • Some big picture understanding of human languages and

  • the difficulties in understanding and producing them.

  • And then the third one is essentially the intersection of those two things.

  • So the ability to build systems for important NLP problems.

  • And you guys will be building some of those for the various assignments.

  • So in terms of the work to be done, this is it.

  • So there's gonna be three assignments.

  • There's gonna be a midterm exam.

  • And then at the end, there's this bigger thing where you sort of have

  • a choice between either you can come up with your own exciting

  • world shattering final project and propose it to us.

  • And we gotta make sure every final project has a mentor, which can either be Richard

  • or me, one of the TAs, or someone else who knows stuff about deep learning.

  • Or else, we can give you an exciting project, and so

  • there'll be sort of a default final project,

  • otherwise known as Assignment 4.

  • There's gonna be a final poster session.

  • So every team for the final project, you're gonna have teams up to three for

  • the final project, has to be at the final poster session.

  • Now we thought about having it in our official exam slot, but

  • that was on Friday afternoon, and so we decided people might not like that.

  • So we're gonna have it in the Tuesday early afternoon session,

  • which is when the language class exams are done.

  • So no offense to languages, but

  • we're assuming that none of you are doing first year intensive language classes.

  • Or at least, you better find a teammate who isn't.

  • >> [LAUGH] >> Okay, yeah, so

  • we've got some late days.

  • Note that each assignment has to be handed in within three days so we can grade it.

  • Yeah, okay, yeah, so Assignment 1, we're gonna hand out on Thursday,

  • so for that assignment, it's gonna be pure Python, except for

  • using the NumPy library, which is kinda the basic vector and matrices library.

  • And people are gonna do things from scratch, because I think

  • it's a really important educational skill that you've actually done things and

  • gotten it to work from scratch.

  • And you really know for

  • yourself what the derivatives are because you've calculated them.

  • And because you've implemented them, and you've found that you can calculate

  • derivatives and implement them, and the thing does actually learn and work.

  • If you've never done this,

  • the whole thing's gonna seem like black magic ever after.

  • So it's really important to actually work through it by yourself.

  • But nevertheless, one of what things that's being transforming deep learning is

  • that there are now these very good software packages,

  • which actually make it crazily easy to build deep learning models.

  • That you can literally take one of these libraries and sort of write 60 lines

  • of Python, and you can be training a state-of-the-art deep learning system

  • that will work super well, providing you've got the data to train it on.

  • And that's sort of actually been an amazing development over

  • the last year or two.

  • And so for Assignments 2 and 3, we're gonna be doing that.

  • In particular, we're gonna be using TensorFlow, which is the Google

  • deep learning library, which is sort of, well, Google's very close to us.

  • But it's also very well engineered and

  • has sort of taken off as the most used library now.

  • But there really are a whole bunch of other good libraries for deep learning.

  • And I mentioned some of them below.

  • Okay, do people have any questions on class organization?

  • Or anything else up until now, or do I just power on?

  • >> [INAUDIBLE] >> Yeah Okay, so, and something

  • I'm gonna do is repeat all questions, so they'll actually work on the video.

  • So, the question is, how are our assignments gonna be submitted?

  • They're gonna be submitted electronically online,

  • instructions will be on the first assignment.

  • But yeah, everything has to be electronic, what we use in Gradescope for the grading.

  • For written stuff, if you wanna hand write it, you have to scan it for

  • yourself, and submit it online.

  • Any other questions?

  • >> [INAUDIBLE] >> Yeah.

  • So, the question was, are the slides on the website?

  • Yes, they are.

  • The slides were on the website before the class began, and we're gonna try and

  • keep that up all quarter.

  • So, you should just be able to find them, cs224n.stanford.edu.

  • Any other questions, yeah?

  • Yeah, so that was on the logistics, if you're doing assignment four.

  • It's partly different, and partly the same, so if you're doing the default

  • assignment four, and we'll talk all about final projects in a couple of weeks.

  • You don't have to write a final project proposal, or talk to a mentor,

  • because we've designed the project for you as a starting off point of the project.

  • But on the other hand, otherwise, it's the same.

  • So, it's gonna be an open ended project,

  • in which there are lots of things that you can try to make the system better, and

  • we want you to try, and we want you to be able to report on what are the different

  • exciting things you've tried, whether they did, or didn't make your system better.

  • And so, we will be expecting people doing assignment four to also write up and

  • present a poster on what they've done.

  • Any other questions?

  • Yes, so their question was on whether we're using Piazza.

  • Yes, we're using Piazza for communication.

  • So, we've already setup the Piazza, and we attempted to enroll all the enrolled

  • students, so hopefully if you're an involved student, there's somewhere in

  • your junk mailbox, or in one of those places, a copy of a Piazza announcement.

  • Any other questions?

  • Okay, 20 some minutes to go.

  • I'll power ahead.

  • Very quickly, why is NLP hard?

  • I think most people, maybe especially computer scientist,

  • going into this just don't understand why NLP is hard.

  • It's just a sequence of words, and they've been dealing with programming languages.

  • And you're just gonna read the sequence the words.

  • Why is this hard?

  • It turns out it's hard for a bunch of reasons,

  • because human languages aren't like programming languages.

  • So, human languages are just all ambiguous.

  • Programming languages are constructed to be unambiguous,

  • that's why they have rules like you can.

  • And else goes with the nearest 'if' and

  • you have to get the indentation right in Python.

  • Human languages aren't like that, so human languages are when there's

  • an 'else' just interpret it with whatever 'if' makes most sense to the hearer.

  • And when we do reference in programming language,

  • we use variable names like x and y, and this variable.

  • Whereas, in human languages, we say things like this and that and she, and

  • you're just meant to be able to figure out from context who's being talked about.

  • But that's a big problem, but it's perhaps, not even the biggest problem.

  • The biggest problem is that humans

  • use language as an efficient communication system.

  • And the way they do that is by not saying most things, right?

  • When you write a program, we say everything that's needed to get it to run.

  • Where in a human language, you leave out most of the program, because you think

  • that your listener will be able to work out which code should be there, right?

  • So, it's sorta more a code snippet on StackOverflow, and

  • the listener is meant to be able to fill in the rest of the program.

  • So, human language gets its efficiency.

  • We kinda actually communicate very fast by human language, right?

  • The rate at which we can speak.

  • It's not 5G communications speeds, right?

  • It's a slow communication channel.

  • But the reason why it works efficiently is we can say minimal messages.

  • And our listener fills in all the rest with their world knowledge,

  • common sense knowledge, and contextual knowledge of the situation.

  • And that's the biggest reason why natural language is hard.

  • So, as sort of a profound version of why natural language is hard: I

  • really like this XKCD cartoon, but you definitely can't read, and

  • I can barely read on the computer in front of me.

  • >> [LAUGH] >> But I think if you think about it,

  • it says actually a lot about why natural language understanding is hard.

  • So, the two women speaking to each other.

  • One says, 'anyway, I could care less,' and the other one says,

  • 'I think you mean you couldn't care less, saying you could care less

  • implies you care to some extent,' and the other one says,

  • 'I don't know,' and then continues.

  • We're these unbelievably complicated beings drifting through a void,

  • trying in vain to connect with one another by

  • blindly flinging words out in to the darkness.

  • Every trace of phrasing, and spelling and tone and

  • timing carries countless signals and contexts and subtexts and more.

  • And every listener interprets these signals in their own way.

  • Language isn't a formal system of language, it's glorious chaos.

  • You can never know for sure what any words will mean to anyone.

  • All you can do is try to get better at guessing how your words affect people.

  • So, you have a chance of finding the ones that will make them

  • feel something like you want them to feel.

  • Everything else is pointless.

  • I assume you're giving me tips on how you interpret words,

  • because you want me to feel less alone.

  • If so, then thank you, that means a lot.

  • But if you're just running my sentences passed some mental check list, so

  • you can show off how well you know it, then I could care less.

  • >> [LAUGH] >> And I think if you reflect on this XKCD

  • comic, there's actually a lot of profound content there as to what human

  • language understanding is like, and what the difficulties of it are.

  • But that's probably a bit hard to do in detail, so

  • I'm just gonna show you some simple examples for a minute.

  • You get lots of ambiguities, including funny ambiguities, in natural language.

  • So, here are a couple of,

  • here's one of my favorites that came out recently from TIME magazine.

  • The Pope's baby steps on gays, no, that's not how you meant to interpret this.

  • You're meant to interpret this as the Pope's baby steps on gays.

  • >> [LAUGH] >> Okay.

  • So a question, I mean, why do you get those two interpretations?

  • What is it about human language, and English here,

  • about English that allows you to have these two interpretations?

  • What are the different things going on?

  • Is anyone game to give an explanation of how we

  • Okay, yeah, right.

  • I'll repeat the explanation as I go.

  • You started off with saying it was idiomatic, and some sense,

  • baby steps is sort of an, sort of a metaphor,

  • an idiom where baby steps is meaning little steps like a baby would take,

  • but I mean, before you even get to that, you can kind of just think a large part of

  • this is just a structural ambiguity, which then governs the rest of it.

  • So, one choice Is that you have this noun phrase of the Pope's baby, and

  • then you start interpreting it as a real baby.

  • And then steps is being interpreted as a verb.

  • So, something we find in a lot of languages, including English,

  • is the same word can have fundamentally different roles.

  • He, and the verbal interpretation verb, steps would be being used as a verb.

  • But the other reading is as you said it's a noun compound, so

  • you can put nouns together, and make noun compounds very freely in English.

  • Computer people do it all the time, right?

  • As soon as you've got something like disk drive enclosure, or network interface hub,

  • or something like that, you're just nailing nouns together to make big nouns.

  • So, you can put together baby and steps as two nouns, and

  • make baby steps as a noun phrase.

  • And then you can make the Pope's baby steps is a larger noun phrase.

  • And then you're getting this very different interpretation.

  • But simultaneously, at the same time, you're also changing the meaning of baby.

  • So in one case, the baby was this metaphorical baby, and then in the other

  • one it's a perhaps counter-factually it's a literal baby.

  • Let's do at least one more of that.

  • Here's another good fun one.

  • Boy paralyzed after tumor fights back to gain black belt.

  • >> [LAUGH] >> Which is, again,

  • not how you're meant to read it.

  • You're meant to read it as boy,

  • paralyzed after tumor, fights back to gain black belt.

  • So, how could we characterize the ambiguity in that one?

  • [LAUGH] So, someone suggested missing punctuation,

  • and if, to some extent, that's true.

  • And to some extent, you can use commas to try and

  • make readings clearer in some cases.

  • But there are lots of places where there are ambiguities in language,

  • where it's just not usual standard to put in punctuation, to disambiguate.

  • And indeed, if you're the kind of computer scientist who feels like you want to start

  • putting matching parentheses around pieces of human language to make the unclear

  • interpretation much clearer, you're not then a typical language user anymore.

  • [LAUGH] >> Okay, anyone else gonna have a go,

  • yeah?

  • Yeah, so, this is sort of the ambiguities are in the syntax of the sentence.

  • So, when you have this 'paralyzed' that could either be the main

  • verb of the sentence, so.

  • The boy is paralyzed, then all of after tumor fights back to gain black

  • belt is then this sort of subordinate clause of saying when it happened.

  • And so then the 'tumor' is the subject of 'fights back',

  • or you can have this alternative where 'paralyzed'

  • can also be what's called a passive participle.

  • So, it's introducing a participial phrase of 'paralyzed after tumor'.

  • And so that can then be a modifier of the boy in the same way an adjective can,

  • young boy fights back to gain black belt.

  • It could be boy paralyzed after tumor fights back to gain black belt.

  • And then it's the boy that's the subject of fights.

  • Okay, I have on this slide a couple more examples, but I think I won't go through

  • them in detail, since I'm sort of behind as things are going.

  • Okay, so what I wanted to get into a little bit of for

  • the last bit of class until my time runs out

  • is to introduce this idea of deep learning and NLP.

  • And so, I mean essentially, this is combining

  • the two things that we've been talking about so far, deep learning and NLP.

  • So, we're going to use the ideas of deep learning, neural networks,

  • representation learning, and we're going to apply them to

  • problems in language understanding, natural language processing.

  • And so, in the last couple of years,

  • especially this is just an area that's sorta really starting to take off,

  • and just for the rest of today's class we'll say, a little bit

  • about what are some of the stuff happening where they're at a very high level and

  • that'll sort of prepare for Thursday, starting to dive right into the specifics.

  • And so, that, so

  • there is so different, different classifications you can look at.

  • So on the one hand, deep learning is being applied to lots of different levels of

  • language that things like speech words, syntax, semantics.

  • It's been applied to lots of different sort of tools, algorithms that we use for

  • natural language processing.

  • So, that's things like labeling words for part-of-speech, finding person and

  • organization names, or coming up with syntactic structures of sentences.

  • And then it's been applied to lots of

  • language applications that put a lot of this together.

  • So things that I've mentioned before, like machine translation, sentiment analysis,

  • dialogue agents.

  • And one of the really, really interesting things is that deep learning models have

  • been giving a very unifying method of using the same tools and

  • technologies to understand a lot of these problems.

  • So yes, there are some specifics of different problems.

  • But something that's been quite stunning in the development of deep learning is

  • that there's actually been a very small toolbox of key techniques,

  • which have turned out to be just vastly applicable

  • with enormous accuracy to just many, many problems.

  • Which actually includes not only many, many language problems, but also,

  • most of the rest of what happens in deep learning,

  • whether it's looking at vision problems, or applying deep learning through

  • any other kind of signal analysis, knowledge representation, or

  • anything that you see these few key tools being used to solve all the problems.

  • And what is somewhat embarrassing for human beings part is that typically,

  • they're sort of working super well,

  • much better than the techniques that human beings had previously slaved on for

  • decades developing, without very much customization for different tasks.

  • Okay, so deep learning and language it all starts off with word meaning, and so

  • this is a very central idea gonna develop starting off with the second class.

  • So, what we're gonna do with words is say were going to represent a word,

  • in particular we're going to represent the meaning of the word.

  • As a vector of your numbers.

  • So here's my vector for the word expect.

  • And so I made that, whatever it is, an 8-dimensional vector,

  • I think, since that was good for my slide.

  • But really, we don't use much that small vectors.

  • So minimally, we might use something like 25-dimensional vectors.

  • Commonly, we might be using something like 300-dimensional vectors.

  • And if we're really going to town

  • because we wanna have the best ever system doing something,

  • we might be using a 1000-dimensional vector or something like that.

  • So when we have vectors for words,

  • that means we're placing words in a high-dimensional vector space.

  • And what we find out is, when we have these methods for

  • learning word vectors from deep learning and place words into these

  • high-dimensional vector spaces, these act as wonderful semantic spaces.

  • So, words with similar meanings will cluster together in the vector space, but

  • actually more than that.

  • We'll find out that there are directions in the vector space

  • that actually tell you about components and meaning.

  • So we, one of the problems of human beings is that they're not

  • very good at looking at high-dimensional spaces.

  • So, for the human beings, we always have to project down onto two or

  • three dimensions.

  • And so, in the background, you can see a little bit of a word cloud

  • of a 2D projection of a word vector space, which you can't read at all.

  • But we could sort of start to zoom in on it.

  • And then you get something that's just about readable.

  • So in one part of the space, this is where country words are clustering.

  • And in another part of the space, this is where you're seeing verbs clustering.

  • And you're seeing kind of it's grouping together verbs that mean most similarly.

  • So 'come' and 'go' are very similar, 'say' and 'think' are similar, 'think' and

  • 'expect' are similar.

  • 'Expecting' and 'thinking' are actually similar to 'seeing things' a lot of

  • the time, because people often use see as an analogy for think.

  • Yes?

  • Okay, so the question is, what do the axes in these vector spaces mean?

  • And, in some sense, the glib answer is nothing.

  • So when we learn these vector spaces, well actually we have these 300 D vectors.

  • And they have these axes corresponding to those vectors.

  • And often in practice, we do sort of look at some of those elements in

  • along the axes and see if we can interpret them because it's easy to do.

  • But really, there's no particular reason to think that elements and

  • meaning should follow those vector lines.

  • They could be any other angle in the vector space, and so

  • they don't necessarily mean anything.

  • When we wanna do a 2D projection like this, what we're then using

  • is some method to try and most faithfully get out some of

  • the main meaning from the high dimensional vector space so we can show it to you.

  • So the simplest method that many of you might have seen before in other places,

  • is doing PCA, doing a principal components analysis.

  • There's another method that we'll get to called t-SNE, which is kind of

  • a non-linear dimensionality reduction which is commonly used.

  • But these are just to try and give human beings some sense of what's going on.

  • And it's important to realize that any of these low dimensional projections

  • can be extremely, extremely misleading, right?

  • Because they are just leaving out a huge amount of the information

  • that's actually in the vector space.

  • Here's, I'm just looking at closest words, to the word frog.

  • I'm using the GLOVE embeddings that we did at Stanford and we'll talk about more,

  • in the next couple of lectures.

  • So frogs and toad are the nearest words, which looks good.

  • But if we then look at these other words that we don't understand,

  • it turns out that they're also names for other pretty kinds of frogs.

  • So these word meaning vectors are a great basis of starting to do things.

  • But I just wanna give you a sense, for

  • the last few minutes, that we can do a lot beyond that.

  • And the surprising thing is we're gonna keep using some of these vectors.

  • So traditionally, if we're looking at complex words like uninterested, we might

  • just think of them as being made up as morphemes of sort of smaller symbols.

  • But what we're gonna do is say, well no.

  • We can also think of parts of words

  • as vectors that represent the meaning of those parts of words.

  • And then what we'll wanna do is build a neural network which can compose

  • the meaning of larger units out of these smaller pieces.

  • That was work that Minh-Thang Luong and Richard did a few years ago at Stanford.

  • Going beyond that, we want to understand the structure of sentences.

  • And so another tool we'll use deep learning for is to make

  • syntactic pauses that find out the structure of sentences.

  • So Danqi Chen who's over there, is one of the TAs for the class.

  • So something that she worked on a couple of years ago was doing neural

  • network methods for dependency parsing.

  • And that was hugely successful.

  • And essentially, if you've seen any of the recent Google announcements

  • with their Parsey McParseface and syntax net.

  • That essentially what that's using is a more honed and

  • larger version of the technique that Danqi introduced.

  • So once we've got some of the structure of sentences,

  • we then might want to understand the meaning of sentences.

  • And people have worked on the meaning of sentences for decades.

  • And I certainly don't wanna belittle other ways of working

  • out the meaning of sentences.

  • But in the terms of doing deep learning for NLP,

  • in this class I also wanna give a sense of how we'll do things differently.

  • So the traditional way of doing things, which is commonly lambda calculus,

  • calculus-based semantic theories.

  • That you're giving meaning functions for individual words by hand.

  • And then there's a careful, logical algebra for

  • how you combine together the meanings of words to get kind of semantic expressions.

  • Which have also sometimes been used for programming languages where people worked

  • on denotational semantics for programming languages.

  • But that's not what we're gonna do here.

  • What we're gonna do is say, well, if we start off with the meaning of words

  • being vectors, we'll make meanings for phrases which are also vectors.

  • And then we have bigger phrases and

  • sentences also have their meaning being a vector.

  • And if we wanna know what the relationships between meanings of

  • sentences or between sentences and the world, such as a visual scene,

  • the way we'll do that is we'll try to learn a neural network that can

  • make those decisions for us.

  • Yeah, let's see.

  • So we can use it for all kinds of semantics.

  • This was actually one of the pieces of work that Richard did while he was

  • a PhD student, was doing sentiment analysis.

  • And so this was trying to do a much better,

  • careful, real meaning representation and

  • understanding of the positive and negative sentiments of sentences

  • by actually working out which parts of sentences have different meanings.

  • So the sentences, This movie doesn't care about cleverness, wit,

  • or any other kind of intelligent humor, and the system is actually very accurately

  • able to work out, well there's all of this positive stuff down here, right?

  • There's cleverness, wit, intelligent humor.

  • It's all very positive, and that's the kind of thing a traditional sentiment

  • analysis system would fall apart on, and just say this is a positive sentence.

  • But our neural network system is noticing that there's

  • this movie doesn't care at the beginning and

  • is accurately deciding the overall sentiment for the sentence is negative.

  • Okay, I'm gonna run out of time, so I'll skip a couple of things, but

  • let me just mention two other things that've been super exciting.

  • So there's this enormous excitement now about trying to build chat bots,

  • dialogue agents.

  • Of having speech and language understanding interfaces

  • that humans can interact with mobile computers.

  • There's Alexa and other things like that with and

  • I think it's fair to say that the state of the technology at the moment

  • is that speech recognition has made humongous advances, right?

  • So I mean, speech recognition has been going on for decades,

  • and as someone involved with language technology, I'd been claiming to people,

  • from the 1990s, no, speech recognition is really good.

  • We've worked out really good speech recognition systems.

  • But the fact of the matter is they were sorta not very good and real human beings

  • would not use them if they had any choice because the accuracy was just so low.

  • Whereas, in the last few years neural network-based deep

  • learning speech recognition systems have become amazingly good.

  • I think, I mean maybe this isn't true of the young people in this room

  • apart from me.

  • But I think a lot of people don't actually realize how good that they've gotten.

  • Because I think that there are a lot of people that try things out in 2012 and

  • decide, they're pretty reasonable, but not fantastic, and

  • haven't really used it since.

  • So I encourage all of you, if you don't regularly use speech recognition to go

  • home and try saying some things to your phone.

  • And, I think it's now just amazing how well the speech recognition works.

  • But there's a problem.

  • The speech recognition works flawlessly.

  • And then your phone has no idea what you're saying, and so it says,

  • would you like me to Google that for you?

  • So the big problem, and

  • the centerpiece of the kind of stuff that we're working on in this class, is well

  • how can we actually make the natural language understanding equally good?

  • And so that's a big concentration that what we're going to work on.

  • One place that's actually,

  • have any of you played with Google's Inbox program on cell phones?

  • Any of you tried that out?

  • A few of you have.

  • So one cool but very simple example of a deployed deep

  • learning dialogue agent is Google Inbox's Suggested Replies.

  • So you having recurrent neural network that's going through the message and

  • is then suggesting three replies to your message to send back to the other person.

  • And you know although there are lots of concerns in that program of sort of

  • privacy and other things, and they're careful how they're doing it.

  • Actually often the replies it comes up with are really rather good.

  • If you're looking to cut down on your email load, give Google Inbox a try and

  • you might find that actually you can reply to quite a bit of your email using it.

  • Okay, the one other example I wanted to mention before finishing

  • was Machine Translation.

  • So Machine Translation, this is actually when natural language processing started.

  • It didn't actually start with language understanding in general.

  • Where natural language processing started was, it was the beginning of the Cold War.

  • Americans and Russians alarmed that each other knew too much about something they

  • couldn't understand what people were saying.

  • And coming off of the successes of code breaking in World War II,

  • people thought, we can just get our computers to do language translation.

  • And in the early days it worked really terribly, and

  • things started to get a bit better in the 2000s, and I presume you've all seen

  • kind of classic Google Translate, and that's a lot of half worked.

  • You could sorta get the gist of what it's saying, but it still worked very terribly.

  • Whereas just in the last couple of years really only starting in 2014,

  • there's then started to be use of end-to-end trained deep learning

  • systems to do machine translation which is then called neural machine translation.

  • And it's certainly not the case that all the problems in MT are solved,

  • there's still lots of work to do to improve machine translation.

  • But again, this is a case in which just overnight

  • replacing the 200 person years of work on Google Translate

  • with a new deep learning based machine translation system has overnight

  • produced a huge improvement in translation quality.

  • And there was a big long article about that

  • in the New York Times magazine a few weeks ago that you might've seen.

  • And so rather than traditional approaches to translation where

  • again just running a big, deep, recurrent neural network where

  • it starts off reading through a source sentence generating vector

  • internal representations that represent the sentence so far.

  • And then once it's gone to the end of the sentence,

  • it then starts to generate out words in the translation.

  • So generating words in sequence in the translation

  • is then what's referred to as kind of neural language models,

  • and that is also a key technology that we use in a lot of things that we do.

  • So that's both what's used in the kind of Google Inbox, recurrent

  • neural network, and in the generation side of a neural machine translation system.

  • Okay, so we've gotten to, I just have one more minute and

  • try and get us out of here not too late even though we started late.

  • I mean, the final thing I want to say it's just sort of to emphasize

  • the fact the amazing thing that's happening here is it's all vectors, right?

  • We're using this for all representations of language,

  • whether it's sounds, parts of words, words, sentences,

  • conversations, they're all getting turned into these real value vectors.

  • And that's something that we'll talk about a lot more.

  • I'll talk about it for word vectors on Thursday and

  • Richard will talk a lot more about the vectors next time.

  • I mean, that's something that appalls many people, but I think it's important to

  • realize it's actually something a lot more subtle than many people realize.

  • You could think that there's no structure in this big long vector of numbers.

  • But equally you could say, well I could reshape that vector and

  • I could turn into a matrix or a higher order array which we call a tensor.

  • Or I could say different parts of it or

  • directions of it represent different kinds of information.

  • It's actually a very flexible data structure with

  • huge representational capacity and

  • that's what deep learning systems really take advantage of in all that they do.

  • Okay, thanks a lot.

  • >> [APPLAUSE]

[MUSIC]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it