Placeholder Image

Subtitles section Play video

  • LUCASZ KAISER: Hi, my name is Lucasz Kaiser,

  • and I want to tell you in this final session

  • about Tensor2Tensor, which is a library we've

  • built on top of TensorFlow to organize the world's models

  • and data sets.

  • So I want to tell you about the motivation,

  • and how it came together, and what you can do with it.

  • But also if you have any questions

  • in the meantime anytime, just ask.

  • And only if you've already used Tensor2Tensor, in that case,

  • you might have even more questions.

  • But the motivation behind this library

  • is-- so I am a researcher in machine learning.

  • I also worked from production [INAUDIBLE] models,

  • and research can be very annoying.

  • It can be very annoying to researchers,

  • and it's even more annoying to people

  • who put it into production, because the research works

  • like this.

  • You have an idea.

  • You want to try it out.

  • It's machine learning, and you think,

  • well, I will change something in the model, it will be great.

  • It will solve physics problems, or translation, or whatever.

  • So we have this idea, and you're like, it's so simple.

  • I just need to change one tweak, but then, OK, I

  • need to get the data.

  • Where was it?

  • So we search it online, you find it,

  • and it's like, well, so I need to preprocess it.

  • You implement some data reading.

  • You download the model that someone else did.

  • And it doesn't give the result at all

  • that someone else wrote in the paper.

  • It's worse.

  • It works 10 times slower.

  • It doesn't train at all.

  • So then you start tweaking it.

  • Turns out, someone else had this postscript

  • that preprocessed the data in a certain way that

  • improved the model 10 times.

  • So you add that.

  • Then it turns out your input pipeline is not performing,

  • because it doesn't put data on GPU or CPU or whatever.

  • So you tweak that.

  • Before you start with your research idea,

  • you've spent half a year on just reproducing

  • what's been done before.

  • So then great.

  • Then you do your idea.

  • It works.

  • You write the paper.

  • You submit it.

  • You put it in the repo on GitHub,

  • which has a README file that says,

  • well, I downloaded the data from there,

  • but this link has already gone by two days

  • after he made the repo.

  • And then I applied.

  • And you describe all these 17 tweaks,

  • but maybe you forgot one option that was crucial.

  • Well, and then there is the next paper and the next research,

  • and the next person comes and does the same.

  • So it's all great except the production team, at some point,

  • they get like, well, we should put it into production.

  • It's a great result. And then they

  • need to track this whole path, redo all of it,

  • and try to get the same.

  • So it's a very difficult state of the world.

  • And it's even worse because there are different hardware

  • configurations.

  • So maybe something that trained well on a CPU

  • does not train on a GPU, or maybe you need an 8 GPU setup,

  • and so on and so forth.

  • So the idea behind Tensor2Tensor was,

  • let's make a library that has at least a bunch

  • of standard models for standard tasks that includes

  • the data and the preprocessing.

  • So you really can, on a command line, just say,

  • please get me this data set and this model, and train it,

  • and make it so that we can have regression tests and actually

  • know that it will train, and that it will not break with

  • TensorFlow 1.10.

  • And that it will train both on the GPU and on a TPU,

  • and on a CPU--

  • to have it in a more organized fashion.

  • And the thing that prompted Tensor2Tensor,

  • the thing why I started it, was machine translation.

  • So I worked with the Google Translate team

  • on launching neural networks for translation.

  • And this was two years ago, and this was amazing work.

  • Because before that, machine translation

  • was done in this way like--

  • it was called phrase-based machine translation.

  • So if you find some alignments of phrases,

  • then you translate the phrases, and then you

  • try to realign the sentences to make them work.

  • And the results in machine translation

  • are normally measured in terms of something

  • called the BLEU score.

  • I will not go into the details of what it was.

  • It's like the higher the better.

  • So for example, for English-German translation,

  • the BLEU score that human translators get is about 30.

  • And the best phrase-based-- so non-neural network,

  • non-deep-learning-- systems were about 20, 21.

  • And it's been, really, a decade of research at least,

  • maybe more.

  • So when I was doing a PhD, if you got one BLEU score up,

  • you would be a star.

  • It was good PhD.

  • If you went from 21 to 22, it would be amazing.

  • So then the neural networks came.

  • And the early LSTMs in 2015, they were like 19.5, 20.

  • And we talked to the Translate team,

  • and they were like, you know, guys, it's fun.

  • It's interesting, because it's simpler in a way.

  • You just train the network on the data.

  • You don't have all the--

  • no language-specific stuff.

  • It's a simpler system.

  • But it gets worse results, and who knows

  • if it will ever get better.

  • But then the neural network research moved on,

  • and people started getting 21, 22.

  • So the Translate team, together with Brain, where I work,

  • made the big effort to try to make a really large LSTM

  • model, which is called the GNMT, the Google Neural Machine

  • Translation.

  • And indeed it was a huge improvement.

  • It got to 25.

  • BLEU, later-- we added mixtures of experts, it even got to 26.

  • So they were amazed.

  • It launched in production, and well, it

  • was like a two-year effort to take the papers,

  • scale them up, launch it.

  • And to get these really good results,

  • you really needed a large network.

  • So as an example why this is important,

  • or why this was important for Google is--

  • so you have a sentence in German here,

  • which is like, "problems can never

  • be solved with the same way of thinking that caused them."

  • And this neural translator translates the sentence kind

  • of the way it should--

  • I doubt there is a much better translation--

  • while the phrase-based translators, you can see,

  • "no problem can be solved from the same consciousness

  • that they have arisen."

  • It kind of shows how the phrase-based method works.

  • Every word or phrase is translated correctly,

  • but the whole thing does not exactly add up.

  • You can see it's a very machiney way,

  • and it's not so clear what it is supposed to say.

  • So the big advantage of neural networks

  • is they train on whole sentences.

  • They can even train on paragraphs.

  • They can be very fluent.

  • Since they take into account the whole context at once,

  • it's a really big improvement.

  • And if you ask people to score translations,

  • this really starts coming close--

  • or at least 80% of the distance to what human translators do,

  • at least on newspaper language-- not poetry.

  • [CHUCKLING]

  • We're nowhere near that.

  • So it was great.

  • We got the high BLEU scores.

  • We reduced the distance to human translators.

  • It turned out the one system can handle

  • different languages, and sometimes even

  • multilingual translations.

  • But there were problems.

  • So one problem is the training time.

  • It took about a week on a setup of 64 to 128 GPUs.