Placeholder Image

Subtitles section Play video

  • Hi, everybody.

  • I'm Laurence Maroney from the Tensorflow team at Google, and today we're going to talk about text classification.

  • It's part one of a two part series will focus on the data and getting it ready to train a neural network.

  • You will do this hands on, using a workbook that you can find it.

  • The link in the description below announce, step you through it.

  • Text classification has some unique challenges.

  • So before you get coding, let me step you through.

  • Some of these first of all, neural networks typically deal with numbers and not text when learning patterns that could be used for prediction or classifications.

  • So in this case, we're looking at learning from movie reviews to see if those reviews are positive or negative.

  • On.

  • The first step, of course, is to change the words into numbers that represent them.

  • They'll be a little bit more processing of these words into factors determining their sentiments on.

  • We'll cover that in the next video, so let's get coating first.

  • First things first.

  • I'll have to check the licenses before I begin, and now all import tensorflow and numb pie.

  • I'll also use care, ass and print out the version of Tensorflow that I'm using.

  • Okay, now it's time to get the data sense.

  • The IMDb scent is included with Care us, so let's download it on.

  • Let's take a look at what's in there.

  • Note that in this case, the nice folks that care us have done the work for us of converting the words into integers.

  • They've also sorted them into a dictionary so that lower numbers are the most common words and higher numbers of the least common words.

  • So when we loaded the specified 10,000 words, this will then give us the top 10,000 words that he used across all of the reviews.

  • Okay, now we've loaded the data, and we have our training data and labels as well as our test data and labels.

  • It's also nicely sorted into integers for us, which is a great first step for learning.

  • Let's see what the data looks like Next.

  • First, we'll look at our training data.

  • You'll see that we have a total of 25,000 items of data and 25,000 labels describing them the label's very simple it zero for a negative review on one for a positive one.

  • A reviews look like this.

  • It's just a long set of numbers, and these are the indexes into the array of words.

  • The review will start with a one indicating the start of the review.

  • So the first word in the review is Word number 14 which translates to the word this followed by the Value 22 which translates to the word film.

  • The next bit of code.

  • Is that a handy, dandy way of decoding the review?

  • Note that the value zero through three are reserved, with one being the start of the review, as we mentioned a moment ago, and zero is for padding now.

  • This is important, and you'll see that in a moment I could now decode the review and see that 1 14 22 are the start character in this and then film.

  • It's pretty girl right now.

  • Earlier, I skipped over this piece of code, showing me the length of the review.

  • So, for example, the first movie was 218 words long.

  • On the second was 189 words long.

  • Now that's really awkward, and it's confusing to train a neural network.

  • If all of the training data is of different lengths.

  • So let's pick a standard length for every review and if it's longer, will trim it to that length and if it's shorter, will pat it to that length.

  • The caress pre processing AP eyes make this really easy.

  • Here you can see I'm taking the training and test data on making sure it's 256 words long.

  • If I need to pad its, then I'll pad it with the pad character, which is the zero that we saw earlier.

  • A quick look will now show that it worked there.

  • All 256 words Love on.

  • If I now look at my first set of training data, you'll see that it's patted by zeros.

  • Remember, it had been 218 words long, so the extras get patted out to make it to 56.

  • Great.

  • Our training and test data is now ready.

  • So in the next episode you'll take a look at how to design a neural network to accept this data on to train a model to determine the sentiment of movie reviews.

Hi, everybody.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it