Placeholder Image

Subtitles section Play video

  • [APPLAUSE]

  • ZAK STONE: Thank you very much.

  • I'm delighted to be here today to talk to you about some

  • of the fantastic TensorFlow Research Cloud projects

  • that we've seen around the world and to invite

  • you to start your own.

  • Whether you're here in the room, watching on the livestream,

  • or online watching this afterwards, any of you

  • are welcome to get involved with TFRC.

  • Just very briefly, since I'm sure you've

  • heard this all today, the context

  • is this massive improvement in computational capabilities

  • driven by deep learning.

  • So deep learning, and specifically

  • these deep neural networks, are enabling many new applications

  • that are exciting in all sorts of ways,

  • touching all kinds of different data,

  • ranging from images, to speech, to text, even full scenes.

  • And the challenge that many of you are probably grappling with

  • is that these new capabilities come with profound increases

  • in compute requirements.

  • A while back, OpenAI did a study where

  • they measured the total amount of compute required

  • to train some of these famous machine learning models

  • over the past several years.

  • And the important thing to notice about this plot

  • is that it's actually a log scale on the compute axis.

  • So there are tremendous increases in the total amount

  • of compute required to train these state-of-the-art deep

  • learning models over time.

  • And there's this consistent trend up and to the right,

  • that these new capabilities are being unlocked

  • by the additional compute power, as well as lots of hard work

  • by many researchers all around the world

  • in this open community.

  • So unfortunately, these tremendous demands

  • for compute to meet these new opportunities opened up

  • by deep learning are coming to us

  • just as Moore's law is ending.

  • We've benefited for decades upon decades

  • in consistent increases in single-threaded CPU

  • performance.

  • But all of a sudden, now we're down to maybe 3% per year.

  • Who knows?

  • There could always be a breakthrough.

  • But we're not expecting extraordinary year-upon-year

  • gains from single-threaded performance,

  • as we've enjoyed in the past.

  • So in response to that, we believe

  • that specialized hardware for machine learning

  • is the path forward for major performance wins, cost savings,

  • and new breakthroughs across all these research

  • domains that I mentioned earlier.

  • Now, at Google, we've developed a family

  • of special-purpose machine learning

  • accelerators called Cloud TPUs.

  • And we're on our third generation now,

  • and two of these generations are available in the cloud--

  • the second and the third generation.

  • Just to give you a brief overview of the hardware

  • that I'm going to be talking about for the rest

  • of the session, we have these individual devices here--

  • Cloud TPU v2 and v3.

  • And as you can see, we've made tremendous progress generation

  • over generation--

  • 180 teraflops to 420 teraflops.

  • We've also increased the memory from 64 gigabytes

  • of high bandwidth memory to 128, which

  • matters a lot if you care about these cutting-edge natural

  • language processing models, like BERT, or XLNet, or GPT-2.

  • But the most important thing about Cloud TPUs

  • isn't just these individual devices,

  • which are the boards that you see here

  • with the 40 TPU chips connected to a CPU host that's not shown.

  • It's the fact that these devices are designed

  • to be connected together into multi-rack machine learning

  • supercomputers that let you scale much further,

  • and program the whole supercomputer

  • across these many racks as if it were a single machine.

  • Now, on the top here, you can see the cloud TPU v2 pod

  • spanning four racks.

  • The TPUs are in those two central columns,

  • and the CPUs are on the outside.

  • That machine got us to 11 and 1/2 petaflops,

  • which you can also subdivide every which way, as you wish.

  • And the TPU chips, in particular,

  • are connected by this 2-D toroidal mesh

  • network, that enables ultra fast communication.

  • It's much faster than standard data center networking.

  • That's a big factor in performance,

  • especially if you care about things like model parallelism

  • or spatial partitioning.

  • But now, with the Cloud TPU v3 Pod,

  • which is actually liquid-cooled, the picture

  • wasn't big enough to hold all the racks.

  • It spans eight racks out to the side,

  • and it gets you up over 100 petaflops

  • if you're using the entire machine simultaneously.

  • On a raw op by op basis, that's competitive with the largest

  • supercomputers in the world.

  • Although, these TPU supercomputers

  • use lower precision, which is appropriate for deep learning.

  • Now, I've mentioned performance.

  • I just wanted to quantify that briefly.

  • In the most recent MLPerf training version 0.6

  • competition, Cloud TPUs were able to outperform

  • on premise infrastructure.

  • What you can see here is the TPU results

  • in blue compared with the largest on-premise cluster

  • results that were submitted to the MLPerf competition.

  • And in three of the five categories that we entered,

  • the Cloud TPUs delivered the best top line

  • results, including 84% increases over the next entry in machine

  • translation, which is based on transformer and object

  • detection, which was an SSD architecture.

  • Now, obviously, these numbers are evolving all the time.

  • There's tremendous investment and progress in the field.

  • But I just wanted to assure you that these TPUs can really

  • deliver when it comes to high performance at scale.

  • But today we're here to talk about research and expanding

  • access to this tremendous computing power

  • to enable researchers all over the world to benefit from it,

  • and explore the machine learning frontier,

  • make their own contributions to expand it.

  • In order to increase access to cutting-edge machine learning

  • compute, we're thrilled to have been able to create

  • the TensorFlow Research Cloud, in order to accelerate open

  • machine learning research, and hopefully to drive this

  • feedback cycle, where more people than ever before have

  • access state-of-the-art tools.

  • They have new breakthroughs, they

  • published papers, and blog posts, and open source code,

  • and give talks, and share the results with others.

  • That helps even more people gain access to the frontier

  • and benefit from it.

  • So we're trying to drive this positive feedback loop.

  • And so, as part of that, we've actually made well

  • over 1,000 of these Cloud TPU devices available for free

  • to support this open machine learning research.

  • If you're interested in learning more right now,

  • you can go to g.co/tfrc.

  • I'll also have more information at the end of the talk.

  • This pool of compute--

  • the TFRC cluster-- involves not just the original

  • TPU v2 devices that we included, but we've recently added

  • some of the v3 devices of the latest generation,

  • if you're really pushing the limits.

  • And there's the potential for Cloud TPU pod access.

  • If you've gone as far as you can with these individual devices,

  • please email us, let us know, and we'll

  • do our best to get you some access to TPU pods.

  • The underlying motivation of all this

  • is a simple observation, which is that talent is equally

  • distributed throughout the world, but opportunity is not.

  • And we're trying to change that balance,

  • to make more opportunities available to talented people

  • all around the world, wherever they might be.

  • So we've had tremendous interest in the TFRC program so far.

  • More than 26,000 people have contacted

  • us interested in TFRC, and we're thrilled

  • that we've already been able to onboard

  • more than 1,250 researchers.

  • And we're adding more researchers all the time,

  • so if you haven't heard from us yet, please ping us again.

  • We really want to support you with TFRC.

  • The feedback loop is just starting to turn,

  • but already I'm happy to announce

  • that more than 30 papers in the academic community

  • have been enabled by TFRC, and many of these researchers

  • tell us that without the TFRC compute,

  • they couldn't possibly have afforded

  • to carry out this research.

  • So I feel like, in a small way, we

  • grabbed the lever of progress here,

  • and we have tipped it slightly upward.

  • So the whole field is moving just a little bit faster,

  • and we really thank you all for being part of that.

  • I'm most excited though to share some of the stories

  • directly, of the individual researchers and the projects

  • that they've been carrying out on the TFRC Cloud TPUs.

  • Now these researchers, they come from all over the world.

  • I only have time to highlight four projects today,

  • but the fantastic thing is that three of these researchers

  • have been able to come and travel here

  • to be with us in person.

  • So you'll get to hear about their projects

  • in their own words.

  • We'll start with Victor Dibia, here in the upper left.

  • Welcome, Victor.

  • Come on up.

  • [APPLAUSE]

  • VICTOR DIBIA: Hi.

  • Hello, everyone.

  • Really excited to be here.

  • My name is Victor Dibia.

  • I'm originally from Nigeria.

  • And currently, I'm a research engineer

  • with Cloudera Fast Forward Labs in Brooklyn, New York.

  • And so, about a year ago, I got really fascinated

  • about this whole area of the intersection of art and AI.

  • And given my background and my interest

  • in human-computer interaction and applied artificial

  • intelligence, it was something I really wanted to do.

  • Right about the time, I got the opportunity

  • to have access to TFRC, and today, I'm

  • going to talk to you about the results of those experiments,

  • and some of the research results I had working on this.

  • And so, why did I work on this project?

  • As a little kid growing up in eastern Nigeria,

  • my extended family and I would travel to our village

  • once a year.

  • And one of the interesting and captivating part of those trips

  • was something called the Eastern Masquerade dances of Africa.

  • And so, what would happen is that there

  • is these dancers with these complex, elaborate masks.

  • And as a kid, I was really fascinated,

  • and so, this project was a way to bridge

  • my interest in technology arts.

  • And as a research engineer too, I

  • could also express my identity through a project like this.

  • In addition to this, there's this growing area

  • of AI inspired art, or AI generated art.

  • But one thing you'll notice in that space

  • is that most of the data sets that are used

  • for this sort of explorations are mainly

  • classical European art--

  • Rembrandt, Picasso.

  • And so, a project like this is a way

  • to diversify the conversations in that area.

  • And then, finally, the researchers

  • working in the generative model domain, one of the goals

  • here is to also contribute to more complex data sets,

  • compared to things like faces or retail images.

  • And it can be a really interesting way

  • to benchmark some of the new generated models

  • that are being researched today.

  • So what did I do?

  • So like all of us know, the best results

  • or most of the effort from machine learning projects

  • comes from the data collection phase.

  • So I started out collecting images.

  • I curated about 20,000 images, and then

  • resolved that down to about 9,300 high-quality images.

  • And at this point, I was ready to train my model.

  • And so, the beautiful thing is that the TensorFlow team

  • had made available a couple of reference models.

  • And so, I started my experiment using

  • an implementation using DCGANs implemented

  • with TensorFlow TPUEstimator.

  • And so, the picture you see on the right

  • is just a visualization of the training process

  • for deep convolution again.

  • So it starts out as random noise,

  • and as training progresses, it learns

  • to generate images that are really similar to the input

  • data distribution.

  • And so, starting out with the reference implementation,

  • there's two interesting things that I did.

  • The first was to modify the configuration of the network,

  • modify the coder and decoder parameters

  • to let this thing generate larger images--

  • 640px, 1280px-- and then [INAUDIBLE] data input pipeline

  • lets me feed my data set into the model, and get it trained.

  • And so, the thing you should watch out for

  • is to ensure that your data input pipeline matches

  • what the reference model implementations is expecting.

  • It took me about 60 experiments to really track down the error

  • and fix it.

  • So it did take a couple of days to fix that.

  • So in all, I ran about 200 experiments.

  • And at this point, this is where TensorFlow Research Cloud

  • really makes a difference.

  • And so, something like this would take a couple of weeks

  • to get done, but I was able to run most of these experiments,

  • once all bugs were fixed, within about one or two days.

  • And so, at this point, all the images you see here,

  • they look like masks.

  • But the interesting thing is that none of them are real.

  • None of them exist in the real world.

  • And these are all interesting artistic interpretations

  • of what an African mask could look like.

  • And so, what could I do next?

  • And so, I started to think, at this point,

  • I have a model that does pretty well.

  • But the question is, are the images novel?

  • Are they actually new stuff?

  • Or has model just memorized some of my input data

  • sets and regurgitated that?

  • And so, to answer those questions

  • I took a deep semantic search approach, where I used

  • a pre-trained model, VGG16.

  • So I tracked features from all of my data sets

  • and all of my generated images, and I built this interface

  • that allows some sort of algorithmic art

  • inspection, where for each generated image,

  • I can find the top 20 images in the data set

  • that are actually similar to that image.

  • So this is one way to actually inspect the results

  • from a model like this.

  • So going forward, the best and the stable model

  • I was able to train could only generate 640px images.

  • But can we do better?

  • So it turns out that you could use super resolution GANs.

  • And this is just one of my favorite results,

  • where we have a super resolution GAN from the Topaz gigapixel AI

  • model.

  • Here's another interesting result.

  • And what you probably can't see very clearly here,

  • is that there is detail in this super resolve

  • image that really just does not exist in the low resolution

  • image.

  • So it's like a two-step interpretation

  • using your networks.

  • And so, if you're a researcher, or you're

  • an artist, or a software engineer,

  • interested in this sort of work--

  • ZAK STONE: Yeah, there we go.

  • VICTOR DIBIA: Yeah, please go ahead.

  • All of the code I used for this, it's all available online

  • and there's a blog post that goes around to it.

  • So thank you.

  • ZAK STONE: Thank you very much.

  • [APPLAUSE]

  • Thanks very much, Victor.

  • Next up, we have Wisdom.

  • Come on up, Wisdom.

  • Thank you.

  • There you go.

  • Here you go.

  • WISDOM D'ALMEIDA: Thank you.

  • Hi, everyone.

  • I'm glad to be here.

  • I'm Wisdom.

  • I'm from Togo.

  • I grew up there, and I'm currently a visiting researcher

  • at Mila in Montreal.

  • I'm doing research in grounded language learning,

  • natural language understanding, under the supervision

  • of Yoshua Bengio.

  • So since the past year, I've been interested

  • in medical [? pod ?] generation.

  • And so, basically when you go to see a radiologist,

  • you get your chest X-ray taken.

  • And the radiologist tries, in a fraction of second,

  • to interpret the X-ray, and produce radiology reports that

  • has mostly dissections of findings and impressions.

  • And findings are written observations

  • from different regions of the chest.

  • So basically saying if there's an abnormality of that region

  • or not.

  • And the impression section highlights

  • the key clinical findings.

  • So because this happens very fast

  • and radiologists can commit mistakes,

  • the AI community has been thinking of ways

  • to augment radiologists with AI capacity

  • to provide a third eye.

  • And we have [INAUDIBLE] classifiers

  • that work very well for that.

  • The problem with this classification,

  • is you are going from the image to the labels.

  • So where's the step where we generate the radiology report?

  • And this is what I've been interesting in.

  • And [INAUDIBLE] like to try something

  • like image captioning to generate the reports.

  • Basically, condition language model on the input image,

  • and maximize the log-likelihood.

  • But this doesn't work very well on medical reports,

  • because there is nothing in this formulation that

  • ensures clinical accuracy of the report that

  • are being generated, and this is a big problem.

  • And this is what I've been interested to solve,

  • and I found inspiration in grounded language learning.

  • So in this setting, you have a model

  • that received natural language instructions to achieve a task.

  • And to correctly achieve the task,

  • the model needs good natural language instructions.

  • And I said, whoa, we can do the same thing for medical report

  • generation.

  • So on top of maximizing the log-likelihood, which

  • would do in image captioning, we can also reward the language

  • model based on how well its output was

  • useful for a medical task, let's say classification,

  • for instance.

  • So here, the classifier takes the radiology report as inputs,

  • and we can also add an image for superior accuracy.

  • But what is interesting here is in a backward pass,

  • we are updating language model parameters

  • based on how well the output was useful for classification.

  • And that is a good starting point

  • for accuracy, because we are forcing the language

  • models to output things that have

  • enough and pertinent medical clues

  • to ensure accurate diagnosis.

  • And I trained this model on MIMIC-CXR data set, which

  • is the largest to date, with both just radiographs

  • and free-text radiology reports.

  • And to train this, I needed extensive amount of compute.

  • And I started this project as a master's student in India.

  • And I was training on my laptop.

  • I know, yeah, that was painful.

  • So I applied to TRFC to have access to TPUs, and I got it.

  • So I had suddenly many TPUs for free, to do a lot of experiment

  • at the same time.

  • And that was useful to iterate fast in my research,

  • because I needed to reproduce the baselines,

  • as well as optimize my proposed approach.

  • So my estimated TFRC usage was 8,000-plus hours.

  • I used v2 devices, virtual devices,

  • and also got to try the v2 Pod devices.

  • So I couldn't leave you without some results of the model.

  • So this is a case of hiatal hernia.

  • And if you talk to radiologists, they

  • will tell you the main evidence for hiatal hernia

  • is the retro cardiac opacity you can

  • observe with that green arrow.

  • And in the red box, you see the ground truth radiology report

  • of this X-ray.

  • So the green one is a baseline I reproduced and optimized

  • the best I could.

  • But you can see that the model completely

  • misses out on the key findings.

  • And that's a consequence of turning

  • [? on the ?] log-likelihood.

  • Because you are optimizing maximizing

  • the confidence of the model, it will avoid taking risk,

  • and would tell you 80% of the time that your X-ray's fine.

  • So the blue box is my approach.

  • And you can see that by forcing the language model to output

  • things that are useful for classification,

  • the model will find the right words to use to justify

  • this case of hiatal hernia.

  • So I'm very excited to have presented this work very

  • recently at Stanford, at a Frontier of AI-Assisted Care

  • Scientific Symposium, organized by Fei-Fei Li and the Stanford

  • School of Medicine.

  • And I'm excited for what's next with this project,

  • and I'm thankful to the TFRC team

  • for providing the resources that were used for this work.

  • Thanks for having me.

  • ZAK STONE: Thank you very much, Wisdom.

  • [APPLAUSE]

  • That's great.

  • Next up, we have Jade.

  • JADE ABBOTT: Hi, everyone.

  • I'm Jade Abbott.

  • I'm from South Africa, so I've come a very long way

  • to be here.

  • I work for company called Retro Rabbit,

  • but I'm not actually here today to talk

  • about what I do at work, which these days is

  • a lot of birth stuff.

  • I'm here to talk about my side project,

  • and I've picked up research as a hobby.

  • And what we're trying to do is work on--

  • there's a lot of African languages,

  • which I'll speak a little bit later,

  • and very little research.

  • So what we did here is we developed at least four or five

  • of the Southern African languages,

  • developed baseline models.

  • This work feeds into a greater project

  • which we call Masakhane.

  • Masakhane means, "we build together," in isiZulu.

  • And in this project we're trying to change the NLP footprint

  • on the continent.

  • So the problem, we have over 2,000 languages,

  • which is quite insane.

  • Many of these languages are exceptionally complex,

  • some of the most complex in the world.

  • And in contrast, we've got almost no data.

  • If we do have data, we have to dig to find it.

  • And what's even worse is that there's absolutely no research.

  • So if you're a beginner NLP practitioner, currently

  • learning about machine translation or NLP

  • on the continent, and you do a search,

  • and you're trying to find something in your language,

  • there's nothing, right?

  • You can look in some obscure journals,

  • and you'll find maybe some old linguistic publications,

  • and that's the extent of it.

  • And this makes it hard if you're trying to build on models,

  • and you're trying to spur this research.

  • If you look at this graph, you can

  • see what is the normalized paper count by country

  • at the 2018 NLP conferences.

  • And the more orange it is, the more papers we've got.

  • And you see there's a glaringly empty continent in the middle

  • there.

  • And even at a widening NLP workshop at ACL,

  • the graph still doesn't look that much different.

  • And that's meant to be inclusive of more people

  • from around the world.

  • So what did we do?

  • I like to say we took some existing data that we scrounged

  • around and found, and we took the state-of-the-art model,

  • and we smashed them together.

  • They've never seen each other, this model and this data.

  • And what we did was we then decided

  • to actually try to optimize the NMT algorithms,

  • just parameter-wise, just to work better

  • on these low-resourced African languages.

  • And our goal for this is to spur that additional research--

  • because right now there's nothing--

  • provide these baselines.

  • And this is where TFRC came in.

  • Like I said, this is my side project.

  • So instead of having lots of money to do this,

  • I'd actually tried, and I was renting GPUs

  • from a cloud provider, and they were costing me

  • an arm and a leg.

  • I reached out to TFRC, and they were super

  • happy to lend us these TPUs.

  • We basically used Tensor2Tensor framework

  • to train up these models.

  • And we used government data, parallel corpuses

  • that we managed to find there.

  • One of the things that we found that was actually

  • simultaneously presented at ACL--

  • on a different language pair, English to German--

  • we found that optimizing this by comparing coding tokenization

  • allows these very complex low-resource languages,

  • allows us to handle their agglutinative nature.

  • Now, agglutination is when we take those languages which

  • you build up new words by just adding on more words,

  • or like switching little bits completely changes the meaning.

  • That's agglutination, and optimizing this parameter

  • can make really significant differences in the BLEU score.

  • And what was also great is we needed to submit something,

  • I think in two or three weeks, to a workshop.

  • And instead of taking days to run these experiments,

  • it would take a couple of hours to actually build these models.

  • So yes, thank you to TFRC for that.

  • So just some results overview of the five languages we had.

  • You can see northern Sotho, Setswana, and Xitsonga

  • we have almost--

  • in particular in the case of Setswana--

  • almost double the BLEU score.

  • Bigger is better with BLEU.

  • Where in Afrikaans-- Afrikaans is actually

  • a European-based language.

  • It's based on Dutch.

  • That preferred the older statistical machine translation

  • architecture.

  • And we're better there.

  • That unfortunately, runs on CPU, and actually

  • takes a lot longer than we wanted it to.

  • And isiZulu is an anomaly.

  • It's a complex language, but we had very few sentences,

  • and the sentence is also very messy.

  • So in this case, the statistical translation

  • performed slightly better.

  • If you look at this little visualization,

  • it's English to Afrikaans.

  • But what's really cool is you can

  • see the attention actually captured some of the language

  • structures.

  • Here, we've got a particular instance where "cannot"

  • in Afrikaans becomes two words and "nie," and at the end,

  • you have to say "nie" again.

  • It's called a double negative.

  • I'm not sure why.

  • And you can see that it actually does that.

  • Less so on this screen.

  • For some reason there's a yellow there.

  • You can pick up cannot matches to "can" "nie" and "nie."

  • And those are the words, which is quite cool.

  • Here, we've got one of our sample translations.

  • So we've got a source sentence, we've

  • got the reference translation and the transformer

  • translation.

  • Obviously, very few of you are likely to speak Setswana

  • in the audience, so we've got a native speaker

  • to actually translate it back to English,

  • what the transformer generated.

  • And you can see they're talking about sunflower, fields,

  • and lands, and flowering periods,

  • and they've picked up blossoming period.

  • So you can see that it's actually done really, really,

  • really well, despite having so little data.

  • So yeah, this is my call to action.

  • I work on a project, as I said, a co-leader of a project

  • called masakhane.io.

  • You can go check it out.

  • And our idea is to basically change this map.

  • So this map is what our current representation of researchers

  • across the African continent are currently

  • working on languages from.

  • And like I said, the idea is to spur research.

  • So if you know a language from Africa,

  • or even if you don't, and you're willing to contribute

  • time, or resources, or advice--

  • we've got a lot of very junior teams who

  • don't have supervisors or people who

  • work in machine translation.

  • Or even if you'd like to come to a webinar, drop us a message.

  • And yeah.

  • Thank you very much to TFRC for hosting us.

  • And, yeah, I look forward to what else

  • we can actually build.

  • ZAK STONE: Thanks so much, Jade.

  • [APPLAUSE]

  • Thank you.

  • Oh, let me get the clicker too.

  • Jade?

  • Thank you so much.

  • Let's have round of applause to Victor, Wisdom, and Jade

  • for coming to represent their research.

  • [APPLAUSE]

  • Thank you so much.

  • It's really a pleasure to have you here.

  • So there's one more project I want to show.

  • Jonathan wasn't able to be here in person,

  • but this is just fantastic work, and I wanted to showcase it.

  • So Jonathan and his colleagues at MIT

  • won the best paper award at ICLR with a paper

  • called the lottery ticket hypothesis, where they're

  • looking for these sparse trainable neural networks

  • within larger neural networks.

  • Now, that had nothing to do with TFRC,

  • but it was this really interesting idea.

  • Many of the neural networks that we're

  • used to have this tremendous number of parameters.

  • They're very large, and one thing

  • that the open AI graph earlier didn't show

  • is that these neural networks are getting

  • larger and larger over time.

  • There's generally a correlation between larger model sizes

  • and higher accuracy, as long as you have enough training data.

  • But it takes more and more compute power, again,

  • to train these larger and larger networks.

  • So Jonathan and his colleagues asked this question,

  • what if you could find just the right sub-network in this much

  • larger network, that could perform

  • the same task as the larger network,

  • ideally to the same accuracy?

  • And those networks are somewhat whimsically

  • called these lottery tickets.

  • So at ICLR, Jonathan used small networks,

  • because that's what he could afford,

  • to show some initial encouraging evidence for this hypothesis.

  • But the real interesting part of this research, at least

  • from my perspective, since I'm into Big Compute

  • is, does it work at scale, right?

  • And so, to find that out, Jonathan got in touch with us

  • here at TFRC to try to scale up this work.

  • And he was kind enough to say at the bottom

  • here, that for his group, research at this scale

  • would be impossible without TPUs.

  • So let me share a little bit more about his work and then

  • about his findings.

  • So the lottery ticket hypothesis,

  • as I mentioned before, is related

  • to this broader category of techniques called pruning.

  • And to be clear, there are many approaches to pruning neural

  • networks, but most of these approaches

  • take place after the networks have been trained.

  • So you've already spent this compute time and cost

  • to get the network trained, and then

  • you modify the trained model to try and set weights to zero,

  • or reduce the size of the model, or distill it

  • as another approach into a smaller model.

  • But it's interesting to ask, could you just train a smaller

  • network from the start?

  • Could you prune connections early on,

  • maybe at the very beginning, or at least early

  • in the training process, without affecting the learning

  • too much?

  • So like I said, this initial paper

  • showed some very promising results on small networks,

  • on small data sets.

  • So with the TFRC Cloud TPUs, Jonathan

  • took this to models we're all familiar with--

  • ResNet-50 trained on ImageNet--

  • and he found slightly different behavior,

  • but was able to validate the hypothesis.

  • So you can't go all the way back to the beginning,

  • you can't prune all the weights--

  • at least with current understanding,

  • there may be further breakthroughs--

  • but you can go almost back to the first epoch,

  • cut the network down, and then train

  • from there with a much smaller network without any harm

  • to accuracy.

  • In particular, Jonathan found that with ResNet-50, you

  • could remove 80% the parameters at epoch 4,

  • and not hurt the accuracy at all.

  • And you're training to something like 90 epochs or further.

  • So this is a real compute savings.

  • And there's these plots down below with ResNet-50

  • and Inception showing you this rewind epoch,

  • and showing that the test error stays low until you

  • get past rewind epoch 4 or 3.

  • One thing I really appreciated about Jonathan's work

  • is, in addition to carrying out these experiments,

  • and publishing them, and sharing them with the community,

  • and inspiring other research, he built some interesting tools

  • to help manage all the compute power.

  • So what you're seeing here is actually a Google Sheet, so

  • a spreadsheet that Jonathan wired up with scripts

  • to orchestrate all of his experiments.

  • So this was a fully declarative system, at the end of the day.

  • He could add a row to the spreadsheet,

  • and behind the scenes his script would

  • kick off a new experiment, monitor the results,

  • bring them back into the spreadsheet,

  • flag error conditions if anything had gone wrong.

  • And at the end of the day, this spreadsheet

  • had thousands upon thousands of rows,

  • showcasing all the different experiments that

  • were then searchable, and sharable, and usable in all

  • the ways that a spreadsheet is.

  • So I thought this was a great mix of old technology and new.

  • And this was a serious amount of compute.

  • Jonathan estimates that he used at least 40,000 hours

  • of Cloud TPU compute on TFRC.

  • So I hope that underscores that we're

  • really serious about providing a large amount of compute

  • for you to do things that you couldn't do otherwise, and then

  • share them with the research community.

  • So it's not just about the projects

  • that you've heard today.

  • These are just samples of the thousands of researchers

  • who are working on TFRC.

  • And I'd really like to personally encourage

  • all of you to think about your next research project happening

  • on TFRC.

  • And it can be an academic project,

  • or it can be a side project, it can be an art project--

  • as long as it's intended to benefit the community,

  • as long as you're going to share your work with others,

  • make them open, and help accelerate progress

  • in the field, we'd love to hear from you.

  • So if you're interested in getting started right now,

  • you can visit this link below-- g.com/tputalk and enter code

  • TFWORLD.

  • That'll go straight to us and the organizers of this event.

  • And we're happy to make available,

  • as a starting point, five regular cloud

  • TPUs and 20 preemptible cloud TPUs

  • for several months for free.

  • The rest of Google Cloud Services still cost money.

  • So this is not completely free, but the TPUs

  • are the overwhelming majority of the compute cost

  • for most of these compute-intensive projects.

  • And so, we really hope that this enables things

  • that you couldn't do otherwise.

  • And if you get to the limits of what

  • you can do with this initial quota, please reach out.

  • Let us know.

  • Tell us about what you're doing.

  • Tell us what you'd like to do.

  • We can't promise anything, but we'll

  • do our best to help with more, maybe even

  • a lot more compute capacity, including the access to Pods

  • that I mentioned earlier.

  • So thanks again to all of you for being here today,

  • for joining on the livestream or online.

  • Thanks to our speakers for representing their work

  • in person, and we'll all be happy to hang out here

  • afterwards and answer any of the questions you

  • might have for those of you who are here in the room.

  • Please rate this session in the O'Reilly events app.

  • And thank you all very much.

  • Hope you're enjoying TF World.

  • [APPLAUSE]

[APPLAUSE]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it