Placeholder Image

Subtitles section Play video

  • welcome free code campers to it.

  • Practical introduction to Natural Language Processing with Tensorflow, too.

  • I am your host, Dr Phil Tabor, 2012.

  • I got my PhD in experimental condensed matter physics and went to work for Intel Corporation as a back end drying process engineer.

  • I left there in 2015 to pursue my own interests and have been studying artificial intelligence and deep burning ever since.

  • If you're unfamiliar with natural language processing, it is the application of deep neural networks to text processing allows us to do things such as text generation.

  • You may have heard the hubbub in recent months over the open A I G p T to our them that allow them to produce fake news and also allows us to do things like sentiment classification as well as something more mathematical, which is representing strings of characters, words as mathematical constructs that allow us to determine relationships between those words.

  • But more on that in the videos.

  • It would be most helpful if you have some background in deporting.

  • If you know something about deep.

  • No networks, but it's not really required.

  • We're gonna walk through everything in the tutorial, so you'll be able to go from start to finish without any prior knowledge.

  • Although of course, it would be helpful if you would like to see him or deep learning, reinforcement, learning and natural language processing content.

  • Check man here on YouTube at machine learning with Phil, I hope to see you there.

  • And I really hope you enjoy the video.

  • Let's get to it.

  • In this tutorial, you are going to do word in beddings with tensorflow two point.

  • Oh, if you don't know what that means, don't worry.

  • I'll explain what it is and why it's important As we go along.

  • Let's get started before we begin with our imports a couple of housekeeping items.

  • First of all, I am basically working through the tensorflow tutorial from their website.

  • So gonna link that in the description s I'm not claiming this code is my own, although I do some cleaning up at the end to kind of make it my own.

  • But in general it's not really my code.

  • So we start with our imports as usual.

  • We need iota handle dumping the word and beddings to a file so that we can visualize later.

  • We'll need Matt plot live handle plotting.

  • We will need tensorflow as TF.

  • And just a word.

  • So this is tensorflow 2.1 dot zero RC one released candidate one.

  • So this is, as far as I'm aware, the latest build so attention for two point.

  • Oh, throw some really weird warnings and 2.1 seems to deal with that.

  • So I've upgraded.

  • So if you're running tensorflow 2.0, and you get funny errors.

  • Uh, sorry.

  • Funny warnings, but you still get functional code and learning.

  • That is why you wanna update to the newest version of TENSORFLOW.

  • Course we needed care us to handle pretty much everything.

  • We also need the layers for our embedding and dense layers, and we're also going to use the tensorflow data sets.

  • So I'm not gonna have you download your own data set.

  • We're going to use the I am D B movie data set for this particular tutorial.

  • So of course, that is an additional independency for this tutorial.

  • So now that we've handled our imports, let's talk a little bit about what word and weddings are.

  • So how could you represent a word for a machine?

  • And more importantly, instead of a string of characters.

  • How can you represent a collection of words?

  • A bag of words, if you will.

  • So you have a number of options.

  • One way is to take the entire set of all the words that you have in your same movie reviews.

  • You know, you just take all the words and find all the unique words and that becomes your dictionary, and you can represent that as a one hot encoding.

  • So if you have, let's say 10,000 words, then you would have a vector for each word with 10,000 elements, which are predominant heroes except for the one correspondent to whichever word it is.

  • The problem with this and coding is that while it does work, it is incredibly inefficient.

  • And because it is sparse, you know, the majority of the data is zero and the only one important bit and the whole thing so not very efficient.

  • And another option is to do imager and coding, so you could just rank order the numbers.

  • Sorry the words.

  • You could do it in alphabetical order.

  • The order doesn't really matter.

  • You could just assign a number to each unique word, and then every time that word appears in a review.

  • You would have that imager in an array, so you end up with a set of variable length of Ray's, where the length of the array corresponds.

  • The number of words in the review and the members of the array correspond to the words that appear within that review.

  • Now this works.

  • This is far more efficient, but it's still not quite ideal, right?

  • So it doesn't tell you anything about the relationships between the words.

  • So if you think of the word, let's say King.

  • It has a number of connotations, right?

  • A king is a man for one.

  • So there's some relationship between the king and a man.

  • A king has power, right?

  • Has control over a domain, a kingdom.

  • So there is also the connotation of owning land and having control over that land.

  • King males have a queen, so it has some sort of relationship to a queen as well may have a prince and princess.

  • You know all these kinds of different relationships between words that are not incorporated into the er imager encoding of our dictionary.

  • The reason is that the image a recording of our dictionary forms a basis in some higher dimensional space.

  • But all of those vectors are orthogonal, so if we take their dot product, they are essentially at right angles to each other in a hybrid dimensional space.

  • And southern dot product is zero, so there's no projection of one vector one word onto another.

  • There's no overlap in the meaning between the words, at least in this higher dimensional space.

  • Now word M beddings fix this problem by keeping the in injuring coding but then doing a transformation to a totally different space.

  • So we introduce a new space of vector of some arbitrary length.

  • It's a hybrid parameter of your model, much like the number of neurons in a dense layer is hyper primitive.

  • Your model, the length of the embedding layer is a hyper parameter, and we'll just say it's eight.

  • So the word King then has eight floating point elements that describe its relationship to all the other vectors in that space.

  • And so what that allows you to do is to take dot products between two arbitrary words in your dictionary and you get non zero components, and so that what that means, in practical terms, is that you get a sort of semantic relationship between words that emerges is a consequence of training your model.

  • So the way it works in practice is we're gonna have a whole bunch of reviews from the IMDB data set, and they will have some classifications as a good or bad review.

  • So, for instance, you know, uh, for the Star Wars last Jedi movie.

  • I don't think it's in the in there, but, you know, my review would be that it was terrible, awful, no good, totally ruined Luke Luke's character.

  • And so you would see on I'm not alone in that.

  • So if you did a huge number of reviews for the last 10 eye, you would see a strong correlation of words such as horrible bad.

  • Wouldn't characters Mary sue things like that?

  • And so the model would then, uh, take those words running through the embedding layer and try to come up with a prediction for whether not that is a good or bad review and match it up to the training label and then do back propagation to vary those weights in that embedding layer.

  • So say eight elements and by training over the data set multiple times.

  • You can refine these weights such fat.

  • You are able to predict whether or not a review is positive or negative about a particular movie.

  • But also it shows you the relationship between the words because the model learns the correlations between words within reviews that give it either a positive or negative context.

  • So that is word M beddings in a nutshell, and we're gonna go ahead and get started coding that.

  • So the first thing we're gonna have is a on embedding layer, and this is just gonna be for illustration purposes.

  • I'm gonna be layers start embedding And let's say there's 1000 and five elements, so we'll see results.

  • He goes embedding flair.

  • TF constant.

  • 12 three.

  • So then let's print the result, uh dot numb pie.

  • Okay, so let's head to the terminal and execute this and see precisely what we get.

  • Actually, let's do this to print results that no umpire that shape.

  • I think that should work.

  • Let's see what we get in the terminal and let's head to the terminal now.

  • All right, let's give it a try.

  • Okay, So what's important here is you see that you get an array of three elements, right, Because we did the TF constant of 12 and three.

  • And you see, we have five elements because we have broken the imagers into some components in that five element space.

  • Okay, so and that has shaped three by fine, which you would expect because you're passing on three elements in each of these three elements.

  • These three images correspond to a word of oven embedding layer of five elements.

  • Okay, that's relatively clear.

  • Let's go back to the code editor and see what else we can build with this.

  • Okay, so let's go ahead and just kind of comment out all this stuff because we don't need it anymore.

  • So now let's get to the business of actually loading our data set and doing interesting things with it, So ah, we want to use the data set load function so well, say, train date of test data and some info T f d.

  • S that load.

  • IMDb reviews fest slash sub words.

  • Eight.

  • Okay.

  • And then we will define a split, and that is T f D s dot split that train t s not split that test, and we will have a couple other parameters with info equals true, then incorporates information about the, um about the data sets and, as supervised equals truth.

  • So, as supervised tells the data set loader that we want to get back information in the form of data and label as a to pull.

  • So we have the labels for training of our data.

  • So now we're going to need an encoder.

  • So we'll say Info that features text encoder.

  • And so let's just find out what words we have in our dictionary.

  • From this will say print encoder a sub words 1st 20 elements say that Head back the terminal and print it out and see what we can see.

  • So let's run that again.

  • And you it's hard to see.

  • Let me move my face over for a moment and you can see that we get a list of words.

  • The underscores.

  • So the underscore corresponds to space.

  • You get communist periods a underscore and underscore of, so you have a whole bunch of words with underscores that indicate that they are spaces.

  • Okay, so this is kind of the makings of a dictionary.

  • So let's head back to the code editor and continue building on this so we no longer need that print statement.

  • Now the next problem had to deal with is the fact that these reviews are all different lengths, right?

  • So we don't have an identical length for each other reviews.

  • And so when we load up elements into matrix that say they're gonna have different lengths and that is kind of problematic.

  • So the way we do with that is by adding padding.

  • So we find the length of the longest review and then for every review that is short in that we have penned a bunch of zeros to the end in our bag of words.

  • So a list of words you know, the list of managers we will spend a bunch of zeros at the end.

  • So zero isn't a word.

  • It doesn't correspond anything of the words.

  • Start with one.

  • The the rank Wardle numbers start with one, and so we insert a zero because it doesn't correspond anything.

  • It won't hurt the training of our model.

  • So we need something called patted shapes and that has this shape.

  • So batch size and an anti list 20 to pull their.

  • So now that we have our panted shapes were ready to go ahead and get our training and test matches.

  • So let's do that.

  • And since we're good data scientists, we want to do a shuffle.

  • We're gonna use a back size of 10 and it panted shapes specified by what we just defined.

  • Let's clean that up and let's copy, because the train, the test batches, are pretty much identical.

  • Except it's Tess stated that shovel and it's the same size, so we don't have to do any changes there.

  • Scroll down so you can see Okay, so that gives us our data.

  • So what we need next after the data is an actual model.

  • So let's go ahead and define a model so in, as is typical for Caris, it is a sequential model, and that takes a list of layers.

  • So the first layer is an embedding layer, and that takes encoder dot vocab size.

  • Now this is, you know, given to us up here by the encoder object that's given by the information from our data set, and we have some vocabulary size, so there's 10,000 words, but cavalry sizes will capsize is just the size of our dictionary and we want to define an embedding dim.

  • Uh, so that's the number off dimensions for our embedding layer.

  • So we'll call it something like 16 to start eso Let's add another layer.

  • Global gullible global average pulling one d and then we'll need a finally a dense layer.

  • One output activation equals sigmoid.

  • So if this seems mysterious, what this is is the probability that a mapping of sorry this layer is the probability that the review is positive.

  • So it's a signboard.

  • Um, go ahead and get rid of that.

  • And now we want to compile our model with the Adam Optimizer.

  • A binary cross entropy loss with accuracy metrics.

  • Not Metrix.

  • Metrix, huh?

  • Equals thank you a seat.

  • Okay, that's our model.

  • And that is all we need for that.

  • So now we are ready to think about training it.

  • So let's go ahead and do that next.

  • So what we want to do is train and dump the history of our training and an object that we're gonna call history model that fit.

  • We're gonna pass train batches, Tenney, pox, and we're gonna need validation data and that'll be test of batches and we use something like 20 validation steps.

  • Okay, so let's scroll down a little bit so you can see it first of all, and then we're gonna think about, um, once it's done, let's go ahead and plot it.

  • So let's Maze will do that now.

  • So let's handled that.

  • So we want to convert our history to a dictionary, and that's history dot history.

  • And we want to get the accuracy by taking the accuracy key.

  • And we want the validation accuracy using correct syntax, of course.

  • Foul accuracy for validation, accuracy and the number of boxes.

  • Just range one to lend of accuracy plus one.

  • Then we want to do a plot.

  • Big size, nice and large.

  • 12 by nine We want a plot.

  • The pox versus the accuracy Be zero label equals training accuracy.

  • We want to plot the validation accuracy he's in.

  • Just a blue line, not Blue Rose or dot do dot sorry and label equals validation.

  • Accuracy.

  • Ah, plot that, uh, ex label box lot dot Why label accuracy and let's go ahead and add a title while we're at it, creating a validation accuracy well done a little bit.

  • We will include a legend having extraordinarily difficult time typing tonight.

  • Location equals lower, right?

  • And a why limit of 0.5 and one That should be a to pull Excuse me and plot dot show.

  • All right, so let's go ahead and to the terminal and run this and see what the plot looks like.

  • And we are back.

  • Let me move my ugly mug over so we could see a little bit more and let us run the software and see what we get.

  • Okay, so it has started training and it takes around 10 to 11 seconds per epoch.

  • So he's going to sit here and twiddle my thumbs for a minute and fast forward the video while we wait.

  • So, of course, once it finish running, I realize I have a typo, and that is typical.

  • So in Line 46 it is.

  • It is.

  • I spelled out plot and said appeal T, but that's all right.

  • Let's take a look at the data we get in the terminal anyway, so you can see that the validation accuracy throughout 92.5% pretty good and the training accuracy is around 93.82 So a little bit of overtraining, and I've run this a bunch of times and you tend to get a little bit more.

  • We're training.

  • I'm kind of surprised that this final now that I'm running over YouTube, it is actually a little bit less overtraining.

  • Ah, but either way, there's some evidence of overtraining.

  • But a 90% accuracy for such a simple model isn't entirely hateful.

  • So I'm gonna go ahead and head back and correct that typo and then run it again and then show you the plot.

  • So it is here in line 46 right there.

  • And just make sure that nothing else looks wonky.

  • And I believe it is all good.

  • They're looking at my cheat sheet.

  • Everything looks fine.

  • Okay, Let's go back to the terminal and tried again.

  • All right, once more.

  • All right.

  • So it has finished, and you can see that this time the validation accuracy was on 89.5% whereas the training accuracy was 93.85 So it is a little bit over training in this particular run.

  • And there a significant run to run variation, as you might expect.

  • Let's take a look at the plot.

  • All right.

  • So I was stuck my ugly mug right here in the middle so you can see that the training accuracy goes up over time, as we would expect.

  • And the validation accuracy generally does that, but kind of tops out about halfway through the number of epochs.

  • So this is clearly working, and this is actually pretty cool.

  • With such a simple model, we can get some decent review or sentiment, as it were a classification.

  • But we could do one more neat thing, and that is to actually visualize the relationships between the words that are, embedding learns.

  • So let's head back to the code editor and then let's write some code to tackle that task.

  • Okay, so before we do that, you know I want to clean up the code first.

  • Let's go ahead and do that, so ah, I will leave in all that commented stuff.

  • But let's define a few functions.

  • We'll need function to get our data.

  • We'll need a function thio, get our model and we'll need a function to plot data and I'll need a function Thio get R M beddings will say retrieve and beddings, and I'll fill in the parameters for those as we go along.

  • So let's take this stuff from our get our data.

  • Cut that paste it.

  • And, of course, use proper indentation.

  • Migas python is a little bit particulate about that.

  • Okay, make sure everything lines up my sleeve.

  • And then, of course, we have to return the stuff that we're interested in.

  • So you want to return train data test data, And in fact, that's not actually going to do I take it back.

  • Let's come down here.

  • And ah, we want our Sorry.

  • We don't know if you want to return our data building, turn our batches.

  • So return trained batches, test batches.

  • And we'll also need our encoder for the visualizing the relationship relationships between words.

  • So let's return that now.

  • Okay, Now let's handle the function for the get model next.

  • So let's come down here and grab this.

  • Actually, let's yeah, grab all of it and come here and do that.

  • And let's make embedding dim a parameter of our model and you notice in our model we need the encoder.

  • So we also have the pass in the encoder as well as an embedding dim and then at the bottom of the function.

  • We want to return that model pretty straightforward.

  • So then let's handle the plot data next.

  • So we have all of this.

  • Grab that and and death here.

  • So we're going to need a history.

  • And, uh, that looks like all we need because we define a pox accuracy and validation accuracy.

  • Okay, so looks like all we need in the plot data function.

  • So then we have to write our retrieve them beddings function, But first, let's handle all the other stuff.

  • Will say train batches, test batches and encoder eagles get data.

  • In fact, let's rename that to get batch data.

  • To be more specific.

  • This is kind of being pedantic, but you always want to be as a cz descriptive as possible with your naming conventions.

  • So that way, people can read the code and know precisely what it does without having to, you know, make any guesses.

  • So I just say, get data.

  • It isn't necessarily clear that I'm getting batches out of that data, you know, it could just be getting single instances.

  • It could return a generator.

  • It is a little bit ambiguous, so changing the function, him to get bashed it is the appropriate thing to do.

  • So then we'll say model eagles get model passage, the encoder, and then the history will work as intended.

  • And then we call our function to plot the history.

  • And I should work as intended as well.

  • And now we're ready to tackle the retrieve them beddings function.

  • So ah, that is relatively straight forward.

  • So what we want to do is we want to pass in the model and the encoder, and we don't want to pass.

  • What we want to do is we want to.

  • The purpose of this function is to ah take ar encoder and dump it to a T S V file that we can load into a visualize er in the browser to visualize the principal component analysis off our word and coatings.

  • So we need files to write to, and we need to enumerate over these sub words in our encoder and write the metadata as well as the vectors for our coatings.

  • So out vectors io dot open vex that t S V and in right mode and a coating of utf eight.

  • We need out metadata.

  • And that's similar metadata tsp, right, Cody Engels ut half dash eight.

  • Very similar.

  • Now we need to generate over our encoder sub words and get the vectors on event to dump to our vector file as well as the meta data.

  • Wait.

  • Sub numb plus one.

  • And so we have the plus one here.

  • Because remember that we start from one because zero is for our patting, right?

  • Zero doesn't correspond to a word.

  • So the words start from one and go on.

  • So we want to write the word plus a new line.

  • And for the vectors, that's right.

  • A tab delimited string X in vector and plus a new line character at the end.

  • And then you want to close your files.

  • Okay, so then we just scroll down and call our function retrieve on beddings model and coder.

  • Okay, so assuming I haven't made any typos, they should actually work.

  • So I'm gonna go head to head back to the terminal and try it again.

  • All right?

  • Moment of truth.

  • So it is training, so I didn't make any mistakes up until that 0.1 2nd We'll see if it actually makes it through the plot.

  • Oh, but really quick.

  • So if you run this with Tensorflow to let me move my face out of the way.

  • If you run this with tensorflow to, you will get this out of range.

  • End of sequence here.

  • And if you do it well, if you do a Google search for that, you will see a thread about it in the get hub and miscues, someone says that it is fixed in 2.1 dot zero.

  • That RC one, the version of tensorflow, which I am running.

  • However, I still get the warning on the first run in version 2.0 dot zero.

  • I get the warning on every epoch, so it kind of clutters of the terminal output, but it still runs nonetheless and gets comparable accuracy, so it doesn't seem to affect the model performance, but it makes for an ugly YouTube video and gives me uneasy feelings.

  • So I went ahead and updated to the latest release candidate 2.1 dot zero, and you can see that it works relatively well.

  • So one second we'll see the plot again.

  • And of course, I made a mistake again.

  • It's plot history, not plot data, not put history.

  • Let's fix that.

  • All right, Ah, plot.

  • Let's changes to plot history because that is more precise and we will try it again.

  • Let's do it.

  • All right, so it has finished and you can see that this story is much the same liberty of overtraining on the training data.

  • Let's take a look at the plot, and the plant is totally consistent with what we got the last time, you know, in increasing training accuracy and a leveling off validation accuracy.

  • So let's go ahead and check out how these Worden beddings look in the browser.

  • But first, of course, I made a mistake, so weights are not defined, and that is because I didn't define them.

  • So let's go back to the code editor and do that.

  • All right, So what we want to do is this waits equal model dot layers sub zero dot get waits.

  • So this will give us the actual weights from our model, which is the ah zero player is the embedding layer and want to get the weights and the zero element of that.

  • So I'm going to go head to head back the terminal, and I'm gonna actually get rid of the plot here because we know that works, and I'm sick of seeing it.

  • So we will just do the model fitting and retrieve the embedding.

  • So let's do that now.

  • It's one of the downsides of doing code.

  • Live is I make all kinds of silly mistakes while talking and typing.

  • But that's life.

  • See, in a minute.

  • All right, so that finished running.

  • Let's head to the browser and take a look at what it looks like.

  • Okay, so can I zoom in?

  • I can a little bit So let's take a look at this.

  • So to get this you go to load over here on the left side, You can't really see my cursor, but you goto load on the left side, load your vector and metadata files and then you want to click on this three labels mode here.

  • And let's take a look at this.

  • So you see right hand the left side annexed, seated an ottoman.

  • So these would make sense to be, you know, pretty close together because they kind of would.

  • You would expect those three words to be together.

  • Right?

  • Annex conceded.

  • If you're an ex, something someone else has to seed it, it makes sense Let's kind of move around a little bit.

  • Um, see what else we can find.

  • Okay, so this looks like a good when we see waterways navigable human rainfall, petroleum, earthquake.

  • So you can see there's some pretty good relationships here between the words that all makes sense.

  • If you scroll over here, what's interesting is you see Estonia Hercegovina, Slovakia.

  • Sorry for mispronouncing that Cyprus.

  • You see a bunch of country names.

  • So it seems to learn the names, uh, seems to learn that there are relationships between different geographic regions in this case countries.

  • Ah, there was he seated in, annexed on Ottoman again.

  • You could even see conquered in here, next to annexed and seated.

  • Deposed Arc Bishop Bishop assassinated.

  • Oh, you can't see that.

  • Move my face.

  • They just moved me over.

  • So now you can see Surrendered.

  • Conquered Spain, right?

  • Spain was conquered for a time by the Moors Archbishop.

  • Deposed, surrounded, assassinated, invaded.

  • You can see all kinds of cool stuff here.

  • So this is what it looks like.

  • I've seen other words like beautiful, wonderful together.

  • Uh, other stuff.

  • So if you play a round of this, you'll see all sorts off.

  • Interesting relationships between words.

  • And this is just the visual representation of what the word M beddings look like in a reduced dimension, representation of its higher dimensional space.

  • So I hope that has been helpful.

  • I thought this was a really cool project, just a few dozen lines of code, and you get to something that is actually a really neat kind of a neat result where you have, um, a higher dimensional space that gives you mathematical relationships between words.

  • And it does a pretty good job of learning the relationships between those words.

  • And what's interesting is I wonder how well this could be generalized to other stuff.

  • So we feed it, you know?

  • Say twitter, Twitter tweets.

  • Could we get the sentiment out of that?

  • I'm not entirely sure that something would have to play around with.

  • It Seems like you would be able to, so long as there is significant overlap in the dictionary is between the words that we have for you, Aunty Beaty reviews and the Dictionary awards from the Twitter feeds that we scrape.

  • But that would be an interesting application of this to kind of find toxic Twitter comments on the like but I hope this was helpful.

  • Just a reminder.

  • My new course is on sale for 99 for the next five days.

  • There'll be one more still last several days of the year.

  • But there'll be a gap of several days in between, Uh, this channel's totally supported by and Revenue as well as my core sales.

  • So if you want to support the cause, go ahead and click the link in the pen comments last description.

  • And if not, hey, go ahead and share this because that is totally free.

  • And I like that Just as well.

  • Leave a comment down below.

  • Hit the subscribe button.

  • If you haven't already hit the bell icon to get notified.

  • Whenever these new content and I will see you in the next video in this tutorial, you are gonna learn how to do it.

  • Sentiment classifications with tens of low 2.0, let's get started before we begin a couple of notes.

  • First of all, it would be very helpful if you have already seen my previous video on doing word in meetings in tensorflow do boy.

  • No, because we're gonna be borrowing heavily from the concepts I presented in that video.

  • If not, it's not a huge deal.

  • I'll show you everything we need to do as we go along.

  • It's just it'll make more sense with that sort of background.

  • Second point is that I am working through the official tensorflow tutorials.

  • This isn't my code.

  • I did have to fix a couple of bugs in the code, so I guess that makes it mine to some extent.

  • But unless I did not write this so I'm just presenting it for your consumption in video format.

  • All that said, let's go ahead and get to coding our sentiment analysis software.

  • So, as usual, we begin with our imports.

  • You'll need the tensorflow data sets to handle the data from the IMDB library.

  • Of course, you need tensorflow to handle tensorflow type operations.

  • So the first thing we want to do is to load our data set and get our training and testing data from that as well as our encoder, which I explain in the previous video.

  • So let's start there data set an info is load of the IMDB reviews help if I spelled it correctly.

  • Sub words eight k now just a word.

  • These are the reviews.

  • Ah, bunch of reviews from the IMDb data set.

  • So you have a review with an associated classifications either positive or negative with info.

  • He was true as supervised people's true tad that over.

  • Uh, next, we will need our training and testing data sets, set equals data set sub train and data sets up test.

  • And finally, we need part encoder thought and coder.

  • Good grief.

  • I can type cannot type tonight at all.

  • So if you don't know what an encoder is, the basic idea is that it is a ah sort of reduced dimensional representation of a set of words.

  • So you take a word and it associates at with an n dimensional vector, Uh, that has components that will be, um, non perpendicular to other words in your dictionary.

  • So what that means is that you can express words in terms of each other, whereas if you set each word in your dictionary to be a basis factor there orthogonal and so there's no relationship between something like King and Queen, for instance, whereas with the auto encoder representation, uh, rest with the sorry, the word abetting representation.

  • It is the, um it has a non zero component of one vector along another.

  • So you have some relationship between words that allows you to parse meaning of your string of text, and I give a better explanation of my previous videos.

  • So check that out, uh, for your own education.

  • So we're gonna need a couple of global variables above her size 10,000 a batch size for training and some padded shapes.

  • And this is for patting.

  • So when you have a string of words, the string of words could be different lengths.

  • So you have to pad to the length of the longest review, basically, and that is bat size by M.

  • D.

  • So the next thing we'll need is our actual data set.

  • We're gonna shuffle it because we're good data scientists, and we're gonna want to get a padded batch from that in the shape defined with the variable above on the test data set is very similar.

  • Good grief aside, I'm using them for my new text editor, part of my New Year's resolution.

  • And, um, let's yank that, and it is a little bit tricky if you've never used it before.

  • I'm still getting used to it.

  • There we go.

  • As you can see, then we have to go back and insert mode.

  • Test status set test data set dot padded bench and patted shapes.

  • All right, that is good.

  • Next thing we need is our model.

  • So the model is gonna be a sequential caress model with a bi directional layer as well as a couple of dense layers.

  • Reads the binary cross interview loss with an Adam Optimizer learning rate of 10 by one by 10 to the minus four.

  • And then we will say TF Keira Stott layers on betting and coder dot Vocab size 64 TF care US players Bi directional, uh, t f paris layers dot l s t m 64 to parentheses.

  • Dense on.

  • That is 64 with a rally.

  • You activation.

  • If I could ever learn to type properly, that would be very helpful.

  • Another dense layer with an output and this output is going to get a sigmoid activation.

  • And what this represents is tthe e probability of the review being either positive or negative.

  • So the final lap of the model's gonna be a floating point number between zero and one, and it will be the probability of it being a positive review, and we're gonna pass in a couple of dummy reviews, just some kind of softball kind of stuff to see how well it does.

  • But before that, we have to compile our model.

  • And with a binary cross entropy loss, Optimizer equals T F.

  • Harris Optimizers Adam learning one by 10 to the minus four.

  • And we want metrics, accuracy.

  • And then we want the history, which is just the model fit, and this is really for plotting purposes.

  • But I'm not going to any plotting.

  • You get the idea that the, you know, the accuracy goes up over the time, and and the loss goes down over time.

  • So no real need to plot that train data set.

  • We're just gonna do three pox.

  • You could do more, but for the purpose of the video, I'm just gonna do three.

  • It's just 25 because I'll do five for the next model.

  • We're gonna do validation.

  • Data equals test data set and validation steps 30.

  • So next we need to consider a couple of functions.

  • So one of them is to pad the ah, the vectors that we pass into whatever size and the second is to actually generate a prediction.

  • So is defined this functions.

  • And just to be clear, this is for the sample texts.

  • We're gonna pass him because remember the reviews all are all of varying lengths.

  • And so we have to, uh, for purposes of the I guess because the continuity even puts to your model and not really technical phrase, but that we pass in the same length of Vector Thio.

  • You know your model for the training we had to do with a problem of the same problem with the, uh, sample texts we're gonna pass in because we don't have an automated tensorflow function to handle it for us.

  • And we're gonna pad it with zeros because those don't have any meaning in our dictionary and we won't return the vector after extending it.

  • So if you're not familiar with this idiom in python, you can multiply a quantity like, say, a string by a number two, basically multiply that string.

  • So if you had the letter a multiplied by 10 it would give you 10 days and you could do that with list elements as well.

  • Pretty cool stuff.

  • Need little feature of python little known, I think.

  • But that's what we're doing here.

  • So we're gonna, uh, going to pad the zeros to the size of whatever, uh, whatever size we want, minus whatever the length of our victory is and extend that vector with those zeros.

  • Next, we need a sample predict function.

  • And the reason we can't just do a model that predicts is because we have the, uh the issue of dealing with the padding text equals and coder dot and code.

  • And remember, the encoder is what goes from the ah string representation to the higher dimensional representation that allows you to make correlations between words.

  • So if you want a pat it, then pad to size encoded sample spread text 64.

  • That's our batch size or our max length Sorry on, then encoded sample bread.

  • Text TF cast 32 and predictions model that predict you know that expand dimensions encoded Sample Fred text zero batch to mention return predictions.

  • All right, so now we have a model that we have trained once you're on the code.

  • Of course.

  • Now let's come up with a couple of dummy simple, very basic reviews to see how it scores them.

  • So it's a sample.

  • Text equals, uh, this movie was awesome.

  • The acting was incredible.

  • Ah, highly recommend.

  • Um, Then we're going to spell sample texts correctly, Of course.

  • And then we're gonna come up with our predictions.

  • You go sample, predict sample text pad equals true.

  • And multiply that by 100.

  • So we get it as a percentage.

  • And can I?

  • I can't quite scroll down.

  • That is a feature, not a bug.

  • I am sure you can write in whatever positive review you want.

  • So that was a friend Probability.

  • This is a positive review predictions, and I haven't done this before.

  • So when I coded this up the first time Ah, have it Executing twice once with patty calls false ones with Patty was true.

  • To see the delta in the predictions and surprise, surprise is more accurate when you give it a padded reviews.

  • But in this case, I'm gonna change it up on the fly and do a difference at a sample text and give it a negative review to see how it does.

  • This movie was so so I don't know what this is gonna do.

  • That's kind of, you know, vernacular I don't.

  • That is in the database, so we'll see.

  • Be acting was mediocre.

  • Kind of recommend and predictions sample predict sample text.

  • Patty goes true times 100.

  • And we can, um, yank the line and paste it.

  • All right.

  • Okay, So we're gonna go ahead and say this and go back to the terminal and executed and see how it does.

  • And we're gonna come back and write a suddenly more complicated, modeled to see how well that does to see if, you know, adding complexity to the model improves the accuracy of our predictions.

  • So let us right, Quit.

  • And if you've never used them, you have to press colon w Q Sorry.

  • When you're not in uncertain road, right, quick to get help.

  • And then we're gonna go the terminal and see how well it does.

  • All right, so here we are in the terminal.

  • Let's give it a shot and see how many typos I made who are interesting.

  • So it says, check that the data set name is spelled correctly.

  • That probably means I misspelled the name of the data set.

  • All right, me Scroll up a little bit.

  • Ah, it's or I am Devi reviews.

  • Okay, I am All right there.

  • Data set.

  • You can't.

  • You're right there.

  • Okay, so I misspelled the name of the data set.

  • Not a problem, then.

  • T f sentiment.

  • Let us, uh, go up to here.

  • I am D B, right.

  • Quit and give it another shot.

  • I misspelled.

  • Dense.

  • Okay.

  • Can you see that?

  • No, not quite.

  • Uh, it says here, move myself over.

  • Has no attributes dense, So let's fix that.

  • That's in line 24 on 24.

  • Insert and s quit and try again.

  • There.

  • Now it is training for five eat box.

  • I am gonna let this ride and show you the results when it is done really quick, you can see that it gives this funny error.

  • Let me go ahead and move my face out of the way.

  • Now this I keep seeing in the tensorflow to stop.

  • So a SZ far as I can tell, this is related to the version of tensorflow.

  • This isn't something I'm doing or you're doing.

  • There's an open issue on get hub.

  • And, uh, previously it would run that error every time I trained with every epoch.

  • However, after updating, I think tensorflow 2.1.

  • It only does after the 1st 1 so I guess you gain a little bit there.

  • But it is definitely.

  • But it's definitely an issue with Tensorflow, so I'm not too worried about that.

  • So let's go in on this train, all right?

  • So it has finished running and I have teleported to the top right so you can see the accuracy and you can see accuracy starts out low and ends up around 93.9% not too shabby for just five e pox on a very simple model.

  • Likewise, the loss starts rolled to the high end goes role to be low.

  • What's most interesting is that we do get a 79.8% probability that our first review was positive, which it is.

  • So an 80% probability of being correct is pretty good.

  • And then an on Lee 41.93% probably the second being positive now.

  • This was a bit of a lukewarm review.

  • I said it was so so so a 40% probability of being positive is pretty reasonable, in my estimation.

  • So now let's see if we can make a more complex model and get better results.

  • So let's go back to the code and type that up.

  • So here we are.

  • Let's scroll down and say, Let's make our new model So model.

  • You have to make sure you're an insert mode.

  • Of course, Model equals T F.

  • Harris Sequential TF caress layers.

  • Of course, you need an embedding layer to start and coder day vocab size 64 Let's move my mug Like so And at our next layer, which is caress layers bi directional l s t m 64 return true And I am way too far over E that is still we're just gonna have to live with it.

  • It's just gonna be bad code.

  • Not up to the pep eight standards, but whatever.

  • Assuming bidirectional hello s t m 32 caress layers thought that dense in 64 with a value activation and to prevent over fitting, we are gonna add in a little bit of dropout just 0.5 50% And at our final classifications layer with a sigmoid activation model.

  • So I have made double check here.

  • Looks like I forgot.

  • Ah, friend to see There we go.

  • Uh, good grief to leave that line and make our new model alike compile loss equals binary cross entropy.

  • Optimizer equals Adam.

  • Same learning rate.

  • We don't want to change too many things at once.

  • That wouldn't be scientific accuracy.

  • Mystery Eagles model that fit trained data's set data sets not cert pox equal five validation natives set equals test data set 30 validation steps.

  • And we're just gonna scroll up here and, uh, copy.

  • Look, copy all of this visual yank and come down and paste.

  • All right, So Ah, was So I'm detecting a problem here, so I need to modify my sample.

  • Predict problem.

  • My sample predict.

  • So let's go ahead and pass in a model.

  • Call it model underscore just to be safe because I'm declaring one model and then another.

  • I want to make sure these scoping issues are not going to bite me in the rear end.

  • Uh, need model equals model.

  • And let's do likewise here.

  • Molly goes model and will come up here and modify it here as well.

  • Just to be pedantic.

  • And I'm very tired, so this is probably unnecessary.

  • We want to make sure we aren't getting any funny scoping issues, so that the model is doing precisely what we would expect.

  • So let's go ahead and write.

  • Quit and try running it.

  • Oh, actually, I take it back.

  • I want to go ahead and get rid of the fitting for this because we've already run it.

  • We can leave it.

  • Actually, you know what?

  • Now that I think he might just do this and then we will comment this out, all right?

  • And then we don't even need the model equals model there, but I'm gonna leave it.

  • All right, let's try it again.

  • We'll see what we get.

  • So remember, we had a ah, 8% and 41% or 42% probability of it being positive.

  • So let's see what we get with the new model validation data set.

  • So I must have missed typed something.

  • So let's take a look here.

  • Um, right there, because it is validation data, not validation.

  • Data set.

  • All right, try it again.

  • All right.

  • It is training.

  • I will let this run and show you the results when it finishes.

  • So of course, after running it, I realize I made a mistake in the, um in the declaration of the sample predict function.

  • Typical typical unexpected keyword argument.

  • So let's come here.

  • And you know what?

  • Let's just get rid of it.

  • Oh, cause his model underscore.

  • Um, yeah, let's get rid of that.

  • Because we no longer need it and get rid of this.

  • Typical typical.

  • All right, this is one of the situations in which a Jupiter notebook would be helpful.

  • But whatever, I will stick to them and the terminal and pie files because I'm old.

  • All right, let's try this again, and I'll just go ahead and it all this out and we will, ah, meet up when it finishes.

  • I've done it again.

  • Ah, it's not my day, folks.

  • Not my day.

  • And let us find that there.

  • Delete once again.

  • All right.

  • So I finally fixed all the airs.

  • It is done training, and we have our results.

  • So, probability this is a positive review, 86%.

  • A pretty good improvement over 80%.

  • What's even better is that the probability of the second review, which was lukewarm service so being positive, has fallen from 41 or 42% down to 20 22% almost cut in half.

  • So pretty good improvement with a they, you know, somewhat more complicated model and at the expense of slightly longer training.

  • So, you know, 87 seconds is supposed to 47 seconds.

  • So honors my six minutes is supposed to three.

  • Not too bad.

  • So anyway, so what we have done here is loaded.

  • A series of I'm to be reviews used to train a model to do sentiment prediction by looking at correlations between the words and the labels for either positive or negative sentiment and then asking the model to predict what the sentiment of a obviously positive and somewhat lukewarm review was.

  • And we get pretty good results in a very short amount of time.

  • That is the power of tens of low two point.

  • Oh, so I thank you for watching any questions common. 00:59:12.770

welcome free code campers to it.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it