Subtitles section Play video
- Hello?
Okay, it's after 12, so I want to get started.
So today, lecture eight, we're going to talk about
deep learning software.
This is a super exciting topic because it changes
a lot every year.
But also means it's a lot of work to give this lecture
'cause it changes a lot every year.
But as usual, a couple administrative notes
before we dive into the material.
So as a reminder the project proposals for your
course projects were due on Tuesday.
So hopefully you all turned that in,
and hopefully you all have a somewhat good idea
of what kind of projects you want to work on
for the class.
So we're in the process of assigning TA's to projects
based on what the project area is
and the expertise of the TA's.
So we'll have some more information about that
in the next couple days I think.
We're also in the process of grading assignment one,
so stay tuned and we'll get those grades back to you
as soon as we can.
Another reminder is that assignment two has been out
for a while.
That's going to be due next week, a week from today, Thursday.
And again, when working on assignment two,
remember to stop your Google Cloud instances
when you're not working to try to preserve your credits.
And another bit of confusion, I just wanted to
re-emphasize is that for assignment two you really
only need to use GPU instances for the last notebook.
For all of the several notebooks it's just in Python
and Numpy so you don't need any GPUs for those questions.
So again, conserve your credits,
only use GPUs when you need them.
And the final reminder is that the midterm is coming up.
It's kind of hard to believe we're there already,
but the midterm will be in class on Tuesday, five nine.
So the midterm will be more theoretical.
It'll be sort of pen and paper working through different
kinds of, slightly more theoretical questions
to check your understanding of the material that we've
covered so far.
And I think we'll probably post at least a short sort of
sample of the types of questions to expect.
Question?
[student's words obscured due to lack of microphone]
Oh yeah, question is whether it's open-book,
so we're going to say closed note, closed book.
So just,
Yeah, yeah, so that's what we've done in the past
is just closed note, closed book, relatively
just like want to check that you understand
the intuition behind most of the stuff we've presented.
So, a quick recap as a reminder of what we were talking
about last time.
Last time we talked about fancier optimization algorithms
for deep learning models including SGD Momentum,
Nesterov, RMSProp and Adam.
And we saw that these relatively small tweaks
on top of vanilla SGD, are relatively easy to implement
but can make your networks converge a bit faster.
We also talked about regularization,
especially dropout.
So remember dropout, you're kind of randomly setting
parts of the network to zero during the forward pass,
and then you kind of marginalize out over that noise
in the back at test time.
And we saw that this was kind of a general pattern
across many different types of regularization
in deep learning, where you might add some kind
of noise during training, but then marginalize out
that noise at test time so it's not stochastic
at test time.
We also talked about transfer learning where you
can maybe download big networks that were pre-trained
on some dataset and then fine tune them for your
own problem.
And this is one way that you can attack a lot of problems
in deep learning, even if you don't have a huge
dataset of your own.
So today we're going to shift gears a little bit
and talk about some of the nuts and bolts
about writing software and how the hardware works.
And a little bit, diving into a lot of details
about what the software looks like that you actually
use to train these things in practice.
So we'll talk a little bit about CPUs and GPUs
and then we'll talk about several of the major
deep learning frameworks that are out there in use
these days.
So first, we've sort of mentioned this off hand
a bunch of different times,
that computers have CPUs, computers have GPUs.
Deep learning uses GPUs, but we weren't really
too explicit up to this point about what exactly
these things are and why one might be better
than another for different tasks.
So, who's built a computer before?
Just kind of show of hands.
So, maybe about a third of you, half of you,
somewhere around that ballpark.
So this is a shot of my computer at home
that I built.
And you can see that there's a lot of stuff going on
inside the computer, maybe, hopefully you know
what most of these parts are.
And the CPU is the Central Processing Unit.
That's this little chip hidden under this cooling fan
right here near the top of the case.
And the CPU is actually relatively small piece.
It's a relatively small thing inside the case.
It's not taking up a lot of space.
And the GPUs are these two big monster things
that are taking up a gigantic amount of space
in the case.
They have their own cooling,
they're taking a lot of power.
They're quite large.
So, just in terms of how much power they're using,
in terms of how big they are, the GPUs are kind of
physically imposing and taking up a lot of space
in the case.
So the question is what are these things
and why are they so important for deep learning?
Well, the GPU is called a graphics card,
or Graphics Processing Unit.
And these were really developed, originally for rendering
computer graphics, and especially around games
and that sort of thing.
So another show of hands, who plays video games at home
sometimes, from time to time on their computer?
Yeah, so again, maybe about half, good fraction.
So for those of you who've played video games before
and who've built your own computers,
you probably have your own opinions on this debate.
[laughs]
So this is one of those big debates in computer science.
You know, there's like Intel versus AMD,
NVIDIA versus AMD for graphics cards.
It's up there with Vim versus Emacs for text editor.
And pretty much any gamer has their own opinions
on which of these two sides they prefer
for their own cards.
And in deep learning we kind of have mostly picked
one side of this fight, and that's NVIDIA.
So if you guys have AMD cards,
you might be in a little bit more trouble if you want
to use those for deep learning.
And really, NVIDIA's been pushing a lot for deep learning
in the last several years.
It's been kind of a large focus of some of their strategy.
And they put in a lot effort into engineering
sort of good solutions to make their hardware
better suited for deep learning.
So most people in deep learning when we talk about GPUs,
we're pretty much exclusively talking about NVIDIA GPUs.
Maybe in the future this'll change a little bit,
and there might be new players coming up,
but at least for now NVIDIA is pretty dominant.
So to give you an idea of like what is the difference
between a CPU and a GPU, I've kind of made a little
spread sheet here.
On the top we have two of the kind of top end Intel
consumer CPUs, and on the bottom we have two of
NVIDIA's sort of current top end consumer GPUs.
And there's a couple general trends to notice here.
Both GPUs and CPUs are kind of a general purpose
computing machine where they can execute programs
and do sort of arbitrary instructions,
but they're qualitatively pretty different.
So CPUs tend to have just a few cores,