Subtitles section Play video Print subtitles [MUSIC PLAYING] SPEAKER 1: All right. Welcome back, everyone, to an introduction to Artificial Intelligence with Python. Now last time, we took a look at machine learning-- a set of techniques that computers can use in order to take a set of data and learn some patterns inside of that data, learn how to perform a task, even if we, the programmers, didn't give the computer explicit instructions for how to perform that task. Today, we transition to one of the most popular techniques and tools within machine learning that have neural networks. And neural networks were inspired as early as the 1940s by researchers who were thinking about how it is that humans learn, studying neuroscience and the human brain, and trying to see whether or not we can apply those same ideas to computers as well, and model computer learning off of human learning. So how is the brain structured? Well, very simply put, the brain consists of a whole bunch of neurons, and those neurons are connected to one another and communicate with one another in some way. In particular, if you think about the structure of a biological neural network-- something like this-- there are a couple of key properties that scientists observed. One was that these neurons are connected to each other and receive electrical signals from one another, that one neuron can propagate electrical signals to another neuron. And another point is that neurons process those input signals, and then can be activated, that a neuron becomes activated at a certain point, and then can propagate further signals onto neurons in the future. And so the question then became, could we take this biological idea of how it is that humans learn-- with brains and with neurons-- and apply that to a machine as well, in effect, designing an artificial neural network, or an ANN, which will be a mathematical model for learning that is inspired by these biological neural networks? And what artificial neural networks will allow us to do is they will first be able to model some sort of mathematical function. Every time you look at a neural network, which we'll see more of later today, each one of them is really just some mathematical function that is mapping certain inputs to particular outputs, based on the structure of the network, that depending on where we place particular units inside of this neural network, that's going to determine how it is that the network is going to function. And in particular, artificial neural networks are going to lend themselves to a way that we can learn what the network's parameters should be. We'll see more on that in just a moment. But in effect we want to model, such that it is easy for us to be able to write some code that allows for the network to be able to figure out how to model the right mathematical function, given a particular set of input data. So in order to create our artificial neural network, instead of using biological neurons, we're just going to use what we're going to call units-- units inside of a neural network-- which we can represent kind of like a node in a graph, which will here be represented just by a blue circle like this. And these artificial units-- these artificial neurons-- can be connected to one another. So here, for instance, we have two units that are connected by this edge inside of this graph, effectively. And so what we're going to do now is think of this idea as some sort of mapping from inputs to outputs, that we have one unit that is connected to another unit, that we might think of this side as the input and that side of the output. And what we're trying to do then is to figure out how to solve a problem, how to model some sort of mathematical function. And this might take the form of something we saw last time, which was something like, we have certain inputs like variables x1 and x2, and given those inputs, we want to perform some sort of task-- a task like predicting whether or not it's going to rain. And ideally, we'd like some way, given these inputs x1 and x2, which stand for some sort of variables to do with the weather, we would like to be able to predict, in this case, a Boolean classification-- is it going to rain, or is it not going to rain? And we did this last time by way of a mathematical function. We defined some function h for our hypothesis function that took as input x1 and x2-- the two inputs that we cared about processing-- in order to determine whether we thought it was going to rain, or whether we thought it was not going to rain. The question then becomes, what does this hypothesis function do in order to make that determination? And we decided last time to use a linear combination of these input variables to determine what the output should be. So our hypothesis function was equal to something like this: weight 0 plus weight 1 times x1 plus weight 2 times x2. So what's going on here is that x1 and x2-- those are input variables-- the inputs to this hypothesis function-- and each of those input variables is being multiplied by some weight, which is just some number. So x1 is being multiplied by weight 1, x2 is being multiplied by weight 2, and we have this additional weight-- weight 0-- that doesn't get multiplied by an input variable at all, that just serves to either move the function up or move the function's value down. You can think of this as either a weight that's just multiplied by some dummy value, like the number 1 when it's multiplied by 1, and so it's not multiplied by anything. Or sometimes you'll see in the literature, people call this variable weight 0 a "bias," so that you can think of these variables as slightly different. We have weights that are multiplied by the input and we separately add some bias to the result as well. You'll hear both of those terminologies used when people talk about neural networks and machine learning. So in effect, what we've done here is that in order to define a hypothesis function, we just need to decide and figure out what these weights should be, to determine what values to multiply by our inputs to get some sort of result. Of course, at the end of this, what we need to do is make some sort of classification like raining or not raining, and to do that, we use some sort of function to define some sort of threshold. And so we saw, for instance, the step function, which is defined as 1 if the result of multiplying the weights by the inputs is at least 0; otherwise as 0. You can think of this line down the middle-- it's kind of like a dotted line. Effectively, it stays at 0 all the way up to one point, and then the function steps-- or jumps up-- to 1. So it's zero before it reaches some threshold, and then it's 1 after it reaches a particular threshold. And so this was one way we could define what we'll come to call an "activation function," a function that determines when it is that this output becomes active-- changes to a 1 instead of being a 0. But we also saw that if we didn't just want a purely binary classification, if we didn't want purely 1 or 0, but we wanted to allow for some in-between real number values, we could use a different function. And there are a number of choices, but the one that we looked at was the logistic sigmoid function that has sort of an S-shaped curve, where we could represent this as a probability-- that may be somewhere in between the probability of rain of something like 0.5, and maybe a little bit later the probability of rain is 0.8-- and so rather than just have a binary classification of 0 or 1, we can allow for numbers that are in between as well. And it turns out there are many other different types of activation functions, where an activation function just takes the output of multiplying the weights together and adding that bias, and then figuring out what the actual output should be. Another popular one is the rectified linear unit, otherwise known ReLU, and the way that works is that it just takes as input and takes the maximum of that input and 0. So if it's positive, it remains unchanged, but i if it's negative, it goes ahead and levels out at 0. And there are other activation functions that we can choose as well. But in short, each of these activation functions, you can just think of as a function that gets applied to the result of all of this computation. We take some function g and apply it to the result of all of that calculation. And this then is what we saw last time-- the way of defining some hypothesis function that takes on inputs, calculates some linear combination of those inputs, and then passes it through some sort of activation function to get our output. And this actually turns out to be the model for the simplest of neural networks, that we're going to instead represent this mathematical idea graphically, by using a structure like this. Here then is a neural network that has two inputs. We can think of this as x1 and this as x2. And then one output, which you can think of classifying whether or not we think it's going to rain or not rain, for example,