Subtitles section Play video Print subtitles Everyone who uses Python for scientific computing uses NumPy, a third-party package allowing us to work with multidimensional arrays. They are a powerful way of organizing and processing data. Hence NumPy is a fundamental library for those attempting to manipulate larger chunks of information. When your calculations are ready, it is useful to know how to present the results obtained in a graph. “Matplotlib” is a two-dimensional plotting library specially designed for visualization of Python and especially NumPy computations. It contains a large set of tools allowing you to customize the appearance of the graphs you are working with. Finally, TensorFlow is the main library we will use for machine learning. Okay. The good thing about Anaconda is that NumPy and Matplotlib were installed automatically with it. That’s a strong plus of Anaconda. You don’t have to install the main packages separately as you might have to do if you were to use some other software for programming in Python. However, TensorFlow is not included in the automatically installed packages. So, we will have to do that on our own. That’s a useful programming skill to have. So, the quickest way to install it is by opening your start menu and searching for the “Anaconda Prompt”. Type “pip install TensorFlow” in the following way. “Pip” per sé means either “pip install packages” or “pip installs Python”. Strange… but true. Now press “Enter”. When the operation is done, you will be set up for the whole course. This is one of the quickest and easiest ways to install modules and packages on your computer in general. I recommend you to also ‘pip install sklearn’. Great! Now that we’ve covered the preliminary preparations, we can focus on using Python to create our first machine learning algorithm. Ok This is our actual TensorFlow intro. Once again, TensorFlow is a machine learning library developed by Google. It allows us to construct fairly complicated models with little coding. To give you a perspective, our practical example required 20 lines of code. With TensorFlow, it will still be 20 lines of code. No difference whatsoever. TensorFlow is an amazing framework, and you will see that by the end of the course. However, it has a peculiar underlying logic. Remember when you first studied linear algebra or trigonometry? The logic differed greatly from the mathematics you had seen before, right? Well, it’s the same with TensorFlow. Once you start working with it, it’s super easy, but you must make an extra effort to understand it properly. Let’s do that. The most basic notion we’ll need to define is the computational graph. Quoting TensorFlow’s ‘About’ section at tensorflow.org, ‘nodes in the graph represent mathematical operations, while the edges represent the multidimensional data arrays, or tensors, communicated between them’. We won’t be drawing computational graphs, but that’s how such a graph looks Simple as that. Okay. Let’s start coding, and we’ll grasp the rest of the intuition on the go! Naturally, we’ll start by importing the TensorFlow library. We will import tensorflow as tf. Then, we will generate fake data, once again. This code is virtually the same as the one we used before. There is a single line of code difference. Let’s look at it. For each project you work on, you’ll have a dataset. Perhaps, you are used to xlsx or csv files; however, TensorFlow doesn’t work well with them. It is tensor based, so it likes tensors. Therefore, we want a format that can store the information in tensors. One solution to this problem is npz files. That’s basically NumPy’s file type. It allows you to save nd arrays or n-dimensional arrays. Thinking like computer scientists, we can say tensors can be represented as multidimensional arrays. When we read an npz file, the data is already organized in the desired way. Often, this is an important part of machine learning preprocessing. You are given data in a specific file format. Then you open it, preprocess it, and finally save it into an npz. Later, you build your algorithm, using the npz, instead of the original file. So. Back to the code. As you can see, we have called the inputs and targets we generated: generated inputs and generated targets. Next, we can simply save them into a tensor friendly file. The proper way to do that is to use the np savez method. It involves several arguments. The first one is the file name. It is written in quotation marks. I’ll call it TF underscore intro. Then we must indicate the objects we want to save into the file. The syntax is as follows. The label we want to assign to the nd array equals the array we want to save under that label. For us, the label is inputs and is equal to the generated inputs array. Similarly, the targets are equal to the generated targets. Note it is not required to call them inputs and targets. If we would like to, we could call them with arbitrary names, such as Rad1 and Rad2. Executing the code would save the TF_intro file in the same directory as the Jupyter notebook we are using. Okay. Ok. Great! We will create two variables that measure the size of our inputs and outputs. The input size is 2, as there are two input variables, the Xs and the Zs we saw earlier, and the output size is 1, as there is only one output - y. Okay. These two lines of code assigned the values 2 and 1 to the variables input size and output size. Time for the peculiar TensorFlow logic. Each object we will create using the TensorFlow library would do nothing, unless explicitly told to. It would rather describe the logic of the machine learning algorithm, but won’t assign values or execute anything. Remember that, as it is crucial. Time to define our first TensorFlow object, the placeholder. Let’s see the line of code that allows us to do that. Inputs equals: tf dot placeholder of tf dot float 32, comma, square brackets, None, input size. The placeholder is where we feed the data. The data contained in our dataset would go into a placeholder. Naturally, we feed both inputs and targets. Let’s include the code for the targets. The data in the npz file contained exactly the inputs and the targets. We will use the npz to feed data into the model through the placeholders. The float 32 indicates the type of data we want. 32 bits float precision is sufficient for most calculations. Finally, we have the dimensions of the two placeholders. The inputs dimensions are None by input size. The input size is the number of input variables we have. The “None” you see here doesn’t mean the data has no dimension. Instead, it means we need not specify it. That’s useful for us, lazy users. We need not know the number of observations or keep track of it. TensorFlow does that for us. We are only interested in the number of input variables. As you can see, this isn’t much different from our linear model. The dimensions we are working with are n by k, where n is the number of observations, and k is the number of variables. Okay. Similarly, the size of the targets is None by the output size. That’s natural, since outputs and targets have the same shape. Remember! Nothing has happened yet. We have instructed the algorithm how we will feed data, but no data has been fed. Alright. The next thing we would like to do is define the weights and biases. They are declared using the other basic TensorFlow object – variable. Variables preserve their value across iterations, while placeholders don’t. Allow me to elaborate, please. Let’s say we have 2 inputs: A and B, and 2 targets: C and D. A goes into the inputs placeholder. Through the weights and biases, we can obtain an output. Then, we compare the output with the target C. Depending on the comparison, we’ll vary the weights and biases and continue with the next iteration. Then B goes into the inputs placeholder. Through the updated weights and biases, we calculate the new output, which will then be compared to the second target - D. Then, we’ll adjust the weights and biases, once again. Ok. A and B came and went away. We fed them to the model, got the best out of them, and we no longer need them. The weights and biases, however, were preserved throughout the iterations we performed. More precisely, we updated them and kept their updated version using the information provided by A and B. This is the same process we carried out before, but we used a different wording to describe it. We feed the data in the placeholders and vary the variables. Simple as that. Let’s define the variables. The proper method is TF dot variable. The expression you see within the brackets shows us how the variables will be initialized. We will use the random uniform method to be consistent with the minimal example shown