Placeholder Image

Subtitles section Play video

  • ♪ (music) ♪

  • Hi, and welcome to episode three of Zero to Hero with TensorFlow.

  • In the previous episode, you saw how to do basic computer vision

  • using a deep neural network

  • that matched the pixels of an image to a label.

  • So an image like this

  • was matched to a numeric label that represented it like this.

  • But there was a limitation to that.

  • The image you were looking at had to have the subject centered in it

  • and it had to be the only thing in the image.

  • So the code you wrote would work for that shoe,

  • but what about these?

  • It wouldn't be able to identify all of them

  • because it's not trained to do so.

  • For that we have to use something called a convolutional neural network,

  • which works a little differently than what you've just seen.

  • The idea behind a convolutional neural network

  • is that you filter the images before training the deep neural network.

  • After filtering the images,

  • features within the images could then come to the forefront

  • and you would then spot those features to identify something.

  • A filter is simply a set of multipliers.

  • So, for example, in this case, if you're looking at a particular pixel

  • that has the value 192,

  • and the filter is the values in the red box,

  • then you multiply 192 by 4.5,

  • and each of its neighbors by the respective filter value.

  • So it's neighbor above and to the left is zero,

  • so you multiply that by -1.

  • Its upper neighbor is 64, so you multiply that by zero and so on.

  • Sum up the result, and you get the new value for the pixel.

  • Now this might seem a little odd,

  • but check out the results for some filters like this one

  • that when multiplied over the contents of the image,

  • it removes almost everything except the vertical lines.

  • And this one, that removes almost everything except the horizontal lines.

  • This can then be combined with something called pooling,

  • which groups up the pixels in the image and filters them down to a subset.

  • So, for example, max pooling two by two

  • will group the image into sets of 2x2 pixels

  • and simply pick the largest.

  • The image will be reduced to a quarter of its original size

  • but the features can still be maintained.

  • So the previous image after being filtered and then max pooled could look like this.

  • The image on the right is one quarter the size of the one on the left,

  • but the vertical line features were maintained

  • and indeed they were enhanced.

  • So where did these filters come from?

  • That's the magic of a convolutional neural network.

  • They're actually learned.

  • They are just parameters like those in the neurons

  • of a neural network that we saw in the last video.

  • So as our image is fed into the convolutional layer,

  • a number of randomly initialized filters will pass over the image.

  • The results of these are fed into the next layer

  • and matching is performed by the neural network.

  • And over time, the filters that give us the image outputs

  • that give the best matches will be learned

  • and the process is called feature extraction.

  • Here is an example of how a convolutional filter layer

  • can help a computer visualize things.

  • You can see across the top row here that you actually have a shoe,

  • but it has been filtered down to the sole and the silhouette of a shoe

  • by filters that learned what a shoe looks like.

  • You'll run this code for yourself in just a few minutes.

  • Now, let's take a look at the code

  • to build a convolutional neural network like this.

  • So this code is very similar to what you used earlier.

  • We have a flattened input that's fed into a dense layer

  • that in turn in fed into the final dense layer that is our output.

  • The only difference here is that I haven't specified the input shape.

  • That's because I'll put a convolutional layer on top of it like this.

  • This layer takes the input so we specify the input shape,

  • and we're telling it to generate 64 filters with this parameter.

  • That is, it will generate 64 filters

  • and multiply each of them across the image,

  • then each epoch, it will figure out which filters gave the best signals

  • to help match the images to their labels

  • in much the same way it learned which parameters worked best

  • in the dense layer.

  • The max pooling to compress the image and enhance the features looks like this,

  • and we can stack convolutional layers on top of each other

  • to really break down the image

  • and try to learn from very abstract features like this.

  • With this methodology, your network starts to learn

  • based on the features of the image

  • instead of just the raw patterns of pixels.

  • Two sleeves, it's a shirt. Two short sleeves, it's a t-shirt.

  • Sole and laces, it's a shoe-- that type of thing.

  • Now we are still looking at just the simple image

  • as a fashion at the moment

  • but the principles will extend into more complex images

  • and you'll see that in the next video.

  • But before going there

  • try out the notebook to see convolutions for yourself.

  • I've made a link to it in the description below.

  • Before we get to the next video, don't forget to hit that subscribe button.

  • Thank you.

  • ♪ (music) ♪

♪ (music) ♪

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it