Placeholder Image

Subtitles section Play video

  • welcome free code campers to it.

  • Practical introduction to Natural Language Processing with Tensorflow, too.

  • I am your host, Dr Phil Tabor, 2012.

  • I got my PhD in experimental condensed matter physics and went to work for Intel Corporation as a back end drying process engineer.

  • I left there in 2015 to pursue my own interests and have been studying artificial intelligence and deep burning ever since.

  • If you're unfamiliar with natural language processing, it is the application of deep neural networks to text processing allows us to do things such as text generation.

  • You may have heard the hubbub in recent months over the open A I G p T to our them that allow them to produce fake news and also allows us to do things like sentiment classification as well as something more mathematical, which is representing strings of characters, words as mathematical constructs that allow us to determine relationships between those words.

  • But more on that in the videos.

  • It would be most helpful if you have some background in deporting.

  • If you know something about deep.

  • No networks, but it's not really required.

  • We're gonna walk through everything in the tutorial, so you'll be able to go from start to finish without any prior knowledge.

  • Although of course, it would be helpful if you would like to see him or deep learning, reinforcement, learning and natural language processing content.

  • Check man here on YouTube at machine learning with Phil, I hope to see you there.

  • And I really hope you enjoy the video.

  • Let's get to it.

  • In this tutorial, you are going to do word in beddings with tensorflow two point.

  • Oh, if you don't know what that means, don't worry.

  • I'll explain what it is and why it's important As we go along.

  • Let's get started before we begin with our imports a couple of housekeeping items.

  • First of all, I am basically working through the tensorflow tutorial from their website.

  • So gonna link that in the description s I'm not claiming this code is my own, although I do some cleaning up at the end to kind of make it my own.

  • But in general it's not really my code.

  • So we start with our imports as usual.

  • We need iota handle dumping the word and beddings to a file so that we can visualize later.

  • We'll need Matt plot live handle plotting.

  • We will need tensorflow as TF.

  • And just a word.

  • So this is tensorflow 2.1 dot zero RC one released candidate one.

  • So this is, as far as I'm aware, the latest build so attention for two point.

  • Oh, throw some really weird warnings and 2.1 seems to deal with that.

  • So I've upgraded.

  • So if you're running tensorflow 2.0, and you get funny errors.

  • Uh, sorry.

  • Funny warnings, but you still get functional code and learning.

  • That is why you wanna update to the newest version of TENSORFLOW.

  • Course we needed care us to handle pretty much everything.

  • We also need the layers for our embedding and dense layers, and we're also going to use the tensorflow data sets.

  • So I'm not gonna have you download your own data set.

  • We're going to use the I am D B movie data set for this particular tutorial.

  • So of course, that is an additional independency for this tutorial.

  • So now that we've handled our imports, let's talk a little bit about what word and weddings are.

  • So how could you represent a word for a machine?

  • And more importantly, instead of a string of characters.

  • How can you represent a collection of words?

  • A bag of words, if you will.

  • So you have a number of options.

  • One way is to take the entire set of all the words that you have in your same movie reviews.

  • You know, you just take all the words and find all the unique words and that becomes your dictionary, and you can represent that as a one hot encoding.

  • So if you have, let's say 10,000 words, then you would have a vector for each word with 10,000 elements, which are predominant heroes except for the one correspondent to whichever word it is.

  • The problem with this and coding is that while it does work, it is incredibly inefficient.

  • And because it is sparse, you know, the majority of the data is zero and the only one important bit and the whole thing so not very efficient.

  • And another option is to do imager and coding, so you could just rank order the numbers.

  • Sorry the words.

  • You could do it in alphabetical order.

  • The order doesn't really matter.

  • You could just assign a number to each unique word, and then every time that word appears in a review.

  • You would have that imager in an array, so you end up with a set of variable length of Ray's, where the length of the array corresponds.

  • The number of words in the review and the members of the array correspond to the words that appear within that review.

  • Now this works.

  • This is far more efficient, but it's still not quite ideal, right?

  • So it doesn't tell you anything about the relationships between the words.

  • So if you think of the word, let's say King.

  • It has a number of connotations, right?

  • A king is a man for one.

  • So there's some relationship between the king and a man.

  • A king has power, right?

  • Has control over a domain, a kingdom.

  • So there is also the connotation of owning land and having control over that land.

  • King males have a queen, so it has some sort of relationship to a queen as well may have a prince and princess.

  • You know all these kinds of different relationships between words that are not incorporated into the er imager encoding of our dictionary.

  • The reason is that the image a recording of our dictionary forms a basis in some higher dimensional space.

  • But all of those vectors are orthogonal, so if we take their dot product, they are essentially at right angles to each other in a hybrid dimensional space.

  • And southern dot product is zero, so there's no projection of one vector one word onto another.

  • There's no overlap in the meaning between the words, at least in this higher dimensional space.

  • Now word M beddings fix this problem by keeping the in injuring coding but then doing a transformation to a totally different space.

  • So we introduce a new space of vector of some arbitrary length.

  • It's a hybrid parameter of your model, much like the number of neurons in a dense layer is hyper primitive.

  • Your model, the length of the embedding layer is a hyper parameter, and we'll just say it's eight.

  • So the word King then has eight floating point elements that describe its relationship to all the other vectors in that space.

  • And so what that allows you to do is to take dot products between two arbitrary words in your dictionary and you get non zero components, and so that what that means, in practical terms, is that you get a sort of semantic relationship between words that emerges is a consequence of training your model.

  • So the way it works in practice is we're gonna have a whole bunch of reviews from the IMDB data set, and they will have some classifications as a good or bad review.

  • So, for instance, you know, uh, for the Star Wars last Jedi movie.

  • I don't think it's in the in there, but, you know, my review would be that it was terrible, awful, no good, totally ruined Luke Luke's character.

  • And so you would see on I'm not alone in that.

  • So if you did a huge number of reviews for the last 10 eye, you would see a strong correlation of words such as horrible bad.

  • Wouldn't characters Mary sue things like that?

  • And so the model would then, uh, take those words running through the embedding layer and try to come up with a prediction for whether not that is a good or bad review and match it up to the training label and then do back propagation to vary those weights in that embedding layer.

  • So say eight elements and by training over the data set multiple times.

  • You can refine these weights such fat.

  • You are able to predict whether or not a review is positive or negative about a particular movie.

  • But also it shows you the relationship between the words because the model learns the correlations between words within reviews that give it either a positive or negative context.

  • So that is word M beddings in a nutshell, and we're gonna go ahead and get started coding that.

  • So the first thing we're gonna have is a on embedding layer, and this is just gonna be for illustration purposes.

  • I'm gonna be layers start embedding And let's say there's 1000 and five elements, so we'll see results.

  • He goes embedding flair.

  • TF constant.

  • 12 three.

  • So then let's print the result, uh dot numb pie.

  • Okay, so let's head to the terminal and execute this and see precisely what we get.

  • Actually, let's do this to print results that no umpire that shape.

  • I think that should work.

  • Let's see what we get in the terminal and let's head to the terminal now.

  • All right, let's give it a try.

  • Okay, So what's important here is you see that you get an array of three elements, right, Because we did the TF constant of 12 and three.

  • And you see, we have five elements because we have broken the imagers into some components in that five element space.

  • Okay, so and that has shaped three by fine, which you would expect because you're passing on three elements in each of these three elements.

  • These three images correspond to a word of oven embedding layer of five elements.

  • Okay, that's relatively clear.

  • Let's go back to the code editor and see what else we can build with this.

  • Okay, so let's go ahead and just kind of comment out all this stuff because we don't need it anymore.

  • So now let's get to the business of actually loading our data set and doing interesting things with it, So ah, we want to use the data set load function so well, say, train date of test data and some info T f d.

  • S that load.

  • IMDb reviews fest slash sub words.

  • Eight.

  • Okay.

  • And then we will define a split, and that is T f D s dot split that train t s not split that test, and we will have a couple other parameters with info equals true, then incorporates information about the, um about the data sets and, as supervised equals truth.

  • So, as supervised tells the data set loader that we want to get back information in the form of data and label as a to pull.

  • So we have the labels for training of our data.

  • So now we're going to need an encoder.

  • So we'll say Info that features text encoder.

  • And so let's just find out what words we have in our dictionary.

  • From this will say print encoder a sub words 1st 20 elements say that Head back the terminal and print it out and see what we can see.

  • So let's run that again.

  • And you it's hard to see.

  • Let me move my face over for a moment and you can see that we get a list of words.

  • The underscores.

  • So the underscore corresponds to space.

  • You get communist periods a underscore and underscore of, so you have a whole bunch of words with underscores that indicate that they are spaces.

  • Okay, so this is kind of the makings of a dictionary.

  • So let's head back to the code editor and continue building on this so we no longer need that print statement.

  • Now the next problem had to deal with is the fact that these reviews are all different lengths, right?

  • So we don't have an identical length for each other reviews.

  • And so when we load up elements into matrix that say they're gonna have different lengths and that is kind of problematic.

  • So the way we do with that is by adding padding.

  • So we find the length of the longest review and then for every review that is short in that we have penned a bunch of zeros to the end in our bag of words.

  • So a list of words you know, the list of managers we will spend a bunch of zeros at the end.

  • So zero isn't a word.

  • It doesn't correspond anything of the words.

  • Start with one.

  • The the rank Wardle numbers start with one, and so we insert a zero because it doesn't correspond anything.

  • It won't hurt the training of our model.

  • So we need something called patted shapes and that has this shape.

  • So batch size and an anti list 20 to pull their.

  • So now that we have our panted shapes were ready to go ahead and get our training and test matches.

  • So let's do that.

  • And since we're good data scientists, we want to do a shuffle.

  • We're gonna use a back size of 10 and it panted shapes specified by what we just defined.

  • Let's clean that up and let's copy, because the train, the test batches, are pretty much identical.

  • Except it's Tess stated that shovel and it's the same size, so we don't have to do any changes there.

  • Scroll down so you can see Okay, so that gives us our data.

  • So what we need next after the data is an actual model.

  • So let's go ahead and define a model so in, as is typical for Caris, it is a sequential model, and that takes a list of layers.

  • So the first layer is an embedding layer, and that takes encoder dot vocab size.

  • Now this is, you know, given to us up here by the encoder object that's given by the information from our data set