Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • SPEAKER 1: All right.

  • Welcome back, everyone, to an introduction

  • to Artificial Intelligence with Python.

  • Now last time, we took a look at machine learning-- a set of techniques

  • that computers can use in order to take a set of data

  • and learn some patterns inside of that data, learn how to perform a task,

  • even if we, the programmers, didn't give the computer explicit instructions

  • for how to perform that task.

  • Today, we transition to one of the most popular techniques and tools

  • within machine learning that have neural networks.

  • And neural networks were inspired as early as the 1940s

  • by researchers who were thinking about how it is that humans learn,

  • studying neuroscience and the human brain,

  • and trying to see whether or not we can apply those same ideas to computers as

  • well, and model computer learning off of human learning.

  • So how is the brain structured?

  • Well, very simply put, the brain consists of a whole bunch of neurons,

  • and those neurons are connected to one another

  • and communicate with one another in some way.

  • In particular, if you think about the structure of a biological neural

  • network-- something like this--

  • there are a couple of key properties that scientists observed.

  • One was that these neurons are connected to each other

  • and receive electrical signals from one another,

  • that one neuron can propagate electrical signals to another neuron.

  • And another point is that neurons process

  • those input signals, and then can be activated, that a neuron becomes

  • activated at a certain point, and then can propagate further signals

  • onto neurons in the future.

  • And so the question then became, could we take this biological idea of how it

  • is that humans learn-- with brains and with neurons--

  • and apply that to a machine as well, in effect,

  • designing an artificial neural network, or an ANN, which

  • will be a mathematical model for learning that is inspired

  • by these biological neural networks?

  • And what artificial neural networks will allow us to do

  • is they will first be able to model some sort of mathematical function.

  • Every time you look at a neural network, which we'll see more of later today,

  • each one of them is really just some mathematical function

  • that is mapping certain inputs to particular outputs,

  • based on the structure of the network, that depending

  • on where we place particular units inside of this neural network,

  • that's going to determine how it is that the network is going to function.

  • And in particular, artificial neural networks

  • are going to lend themselves to a way that we can learn what

  • the network's parameters should be.

  • We'll see more on that in just a moment.

  • But in effect we want to model, such that it is easy for us

  • to be able to write some code that allows for the network

  • to be able to figure out how to model the right mathematical function,

  • given a particular set of input data.

  • So in order to create our artificial neural network,

  • instead of using biological neurons, we're

  • just going to use what we're going to call units--

  • units inside of a neural network--

  • which we can represent kind of like a node in a graph,

  • which will here be represented just by a blue circle like this.

  • And these artificial units-- these artificial neurons--

  • can be connected to one another.

  • So here, for instance, we have two units that

  • are connected by this edge inside of this graph, effectively.

  • And so what we're going to do now is think

  • of this idea as some sort of mapping from inputs to outputs,

  • that we have one unit that is connected to another unit,

  • that we might think of this side as the input and that side of the output.

  • And what we're trying to do then is to figure out how to solve a problem,

  • how to model some sort of mathematical function.

  • And this might take the form of something

  • we saw last time, which was something like, we

  • have certain inputs like variables x1 and x2, and given those inputs,

  • we want to perform some sort of task--

  • a task like predicting whether or not it's going to rain.

  • And ideally, we'd like some way, given these inputs x1 and x2,

  • which stand for some sort of variables to do with the weather,

  • we would like to be able to predict, in this case,

  • a Boolean classification-- is it going to rain, or is it not going to rain?

  • And we did this last time by way of a mathematical function.

  • We defined some function h for our hypothesis function

  • that took as input x1 and x2--

  • the two inputs that we cared about processing-- in order

  • to determine whether we thought it was going to rain, or whether we thought it

  • was not going to rain.

  • The question then becomes, what does this hypothesis function do in order

  • to make that determination?

  • And we decided last time to use a linear combination of these input variables

  • to determine what the output should be.

  • So our hypothesis function was equal to something

  • like this: weight 0 plus weight 1 times x1 plus weight 2 times x2.

  • So what's going on here is that x1 and x2--

  • those are input variables-- the inputs to this hypothesis function--

  • and each of those input variables is being

  • multiplied by some weight, which is just some number.

  • So x1 is being multiplied by weight 1, x2 is being multiplied by weight 2,

  • and we have this additional weight-- weight 0--

  • that doesn't get multiplied by an input variable

  • at all, that just serves to either move the function up or move the function's

  • value down.

  • You can think of this as either a weight that's

  • just multiplied by some dummy value, like the number

  • 1 when it's multiplied by 1, and so it's not multiplied by anything.

  • Or sometimes you'll see in the literature,

  • people call this variable weight 0 a "bias,"

  • so that you can think of these variables as slightly different.

  • We have weights that are multiplied by the input

  • and we separately add some bias to the result as well.

  • You'll hear both of those terminologies used

  • when people talk about neural networks and machine learning.

  • So in effect, what we've done here is that in order

  • to define a hypothesis function, we just need

  • to decide and figure out what these weights should be,

  • to determine what values to multiply by our inputs to get some sort of result.

  • Of course, at the end of this, what we need

  • to do is make some sort of classification

  • like raining or not raining, and to do that, we use some sort of function

  • to define some sort of threshold.

  • And so we saw, for instance, the step function, which is defined as 1

  • if the result of multiplying the weights by the inputs is at least 0;

  • otherwise as 0.

  • You can think of this line down the middle-- it's kind

  • of like a dotted line.

  • Effectively, it stays at 0 all the way up to one point,

  • and then the function steps--

  • or jumps up-- to 1.

  • So it's zero before it reaches some threshold,

  • and then it's 1 after it reaches a particular threshold.

  • And so this was one way we could define what

  • we'll come to call an "activation function," a function that

  • determines when it is that this output becomes active--

  • changes to a 1 instead of being a 0.

  • But we also saw that if we didn't just want a purely binary classification,

  • if we didn't want purely 1 or 0, but we wanted

  • to allow for some in-between real number values,

  • we could use a different function.

  • And there are a number of choices, but the one that we looked at was

  • the logistic sigmoid function that has sort of an S-shaped curve,

  • where we could represent this as a probability--

  • that may be somewhere in between the probability of rain of something like

  • 0.5, and maybe a little bit later the probability of rain is 0.8--

  • and so rather than just have a binary classification of 0 or 1,

  • we can allow for numbers that are in between as well.

  • And it turns out there are many other different types

  • of activation functions, where an activation function just

  • takes the output of multiplying the weights together and adding that bias,

  • and then figuring out what the actual output should be.

  • Another popular one is the rectified linear unit, otherwise known ReLU,

  • and the way that works is that it just takes as input

  • and takes the maximum of that input and 0.

  • So if it's positive, it remains unchanged, but i if it's negative,

  • it goes ahead and levels out at 0.

  • And there are other activation functions that we can choose as well.

  • But in short, each of these activation functions,

  • you can just think of as a function that gets applied to the result of all

  • of this computation.

  • We take some function g and apply it to the result of all of that calculation.

  • And this then is what we saw last time-- the way of defining

  • some hypothesis function that takes on inputs,

  • calculates some linear combination of those inputs,

  • and then passes it through some sort of activation function to get our output.

  • And this actually turns out to be the model

  • for the simplest of neural networks, that we're

  • going to instead represent this mathematical idea graphically, by using

  • a structure like this.

  • Here then is a neural network that has two inputs.

  • We can think of this as x1 and this as x2.

  • And then one output, which you can think of classifying whether or not

  • we think it's going to rain or not rain, for example,