Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • KARMEL ALLISON: Hi and welcome to Coding TensorFlow.

  • I'm Karmel Allison, and I'm here to guide you

  • through a scenario using TensorFlow's high-level APIs.

  • This video is the second in a three-part series.

  • In this, we'll dig deeper into preparing the data for machine

  • learning, including using feature

  • columns, categorical data, and much more.

  • We'll also explore a machine learning

  • model built using Keras that can be trained with this data.

  • In the previous video, we spoke about a complex data set

  • and how you can load that and get

  • it ready to use in TensorFlow.

  • We used the Covertype data set from the US Forestry Service

  • and Colorado State University, which

  • has about 500,000 rows of geophysical data collected

  • from particular regions in National Forest areas.

  • We are going to use the features in this data set

  • to try to predict the soil type that was found in each region.

  • We took the raw data and put it into a TensorFlow data

  • set that generates dictionaries of feature tensors and labels,

  • but we still have lots of feature types.

  • Some are continuous, some are categorical,

  • some are one-hot encoded.

  • We need to represent these in a way that

  • is meaningful to an ML model.

  • You'll learn how to do that in this video,

  • so let's get started.

  • We are going to use feature columns for that.

  • In TensorFlow, a feature column is a configuration class.

  • It doesn't itself hold any data but it tells our model

  • how to transform the raw data so that it matches the expectation

  • in many ML models that the data is numeric and continuous.

  • If you're working with data that is already numeric,

  • image data, for example, feature columns may not be necessary,

  • but for many real-world applications,

  • data is structured and represents

  • vocabularies or human concepts that we

  • need to transform before we can use them in machine learning

  • models.

  • Feature columns are a great way to do that.

  • Let's take, for example, our Covertype category, which

  • is an integer between 1 and 7 that represents

  • the type of tree in the region.

  • You'll note that all we've done here

  • is define a type of feature, and we

  • haven't passed any of our data into that feature yet.

  • It is just a configuration object

  • that will tell our model to expect

  • categorical IDs less than the outer range value of 8.

  • Now we have to configure how we want

  • to transform our categorical data for use in a model that

  • expects continuous data.

  • Using feature columns, we can trivially

  • build a set of instructions that allow the model to convert

  • the categories into an embedding column, as shown here.

  • Now, we could have done this processing in our data parsing

  • function ourselves, converting the categorical IDs

  • to a one-hot vector manually.

  • The advantage of using feature columns

  • is that the transformations they encode

  • become part of the model's graph and can therefore be

  • exported with the saved model.

  • So you should push any transformations

  • that you want to apply to data both during training

  • and at inference time into feature columns.

  • We can define columns for each of our features.

  • Data that is already numeric is straightforward.

  • We just use a numeric column.

  • Sometimes, as in the case of soil type data here,

  • data is spread out over a vector,

  • and numeric feature columns allow us to easily capture

  • that relationship with the shape argument

  • so that our model understands wilderness area as a length 40

  • tensor rather than 40 independent tensors.

  • All right, so we configure all of our features, and then what?

  • Well, these become the first layer

  • of our model using a feature layer.

  • When we train our model, this first layer

  • will act like any other Keras layer,

  • but its primary role will be to take in the raw data, including

  • the categorical indices, and transform it

  • into the appropriate representations

  • that our neural net is expecting.

  • This layer will also handle creating and training

  • our embedding Covertype.

  • So if you have data that needs transformation

  • before it fits into a model--

  • maybe it's categorical like ours or even has

  • string names and vocabularies--

  • you can use feature columns to handle those transformations,

  • batch by batch, in TensorFlow, rather

  • than having a whole separate pipeline to do feature

  • transformations in memory.

  • TensorFlow provides many feature columns and even ways

  • to combine individual columns into more

  • complex representations of the data that your model can learn.

  • So, before we wrap up, let me quickly

  • show you how this would be a layer in a Keras model, which

  • we'll go into in more detail in the next video.

  • Note that we are using tf.keras here,

  • which implements the Keras API spec

  • but adds additional TensorFlow-specific

  • features on top of it, like support

  • for TensorFlow's eager execution, optimizers,

  • and so on.

  • Since the first thing I want to try

  • is a simple sequence of deep learning layers,

  • Keras is the easiest way to start.

  • We will start with a simple sequential model,

  • but what I want to focus on right

  • now is just this first layer.

  • Our first layer is a feature layer

  • that will do all the data transformation we just

  • discussed and feed the transformed data

  • into the rest of the model.

  • We'll do that in part three of this series, where we'll

  • look at adding the data and training the model with it,

  • including choosing loss functions and optimizers.

  • It will be right here on the TensorFlow YouTube channel,

  • so don't forget to hit that Subscribe button,

  • and I'll see you there.

  • [MUSIC PLAYING]

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it