/tensorflow/g3doc/tutorials/wide_and_deep/index.md
Markdown | 275 lines | 223 code | 52 blank | 0 comment | 0 complexity | 48d6ec4bf74363e698150664db4bc0ee MD5 | raw file
- # TensorFlow Wide & Deep Learning Tutorial
- In the previous [TensorFlow Linear Model Tutorial](../wide/),
- we trained a logistic regression model to predict the probability that the
- individual has an annual income of over 50,000 dollars using the [Census Income
- Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income). TensorFlow is
- great for training deep neural networks too, and you might be thinking which one
- you should choose—Well, why not both? Would it be possible to combine the
- strengths of both in one model?
- In this tutorial, we'll introduce how to use the TF.Learn API to jointly train a
- wide linear model and a deep feed-forward neural network. This approach combines
- the strengths of memorization and generalization. It's useful for generic
- large-scale regression and classification problems with sparse input features
- (e.g., categorical features with a large number of possible feature values). If
- you're interested in learning more about how Wide & Deep Learning works, please
- check out our [research paper](http://arxiv.org/abs/1606.07792).
- ![Wide & Deep Spectrum of Models]
- (../../images/wide_n_deep.svg "Wide & Deep")
- The figure above shows a comparison of a wide model (logistic regression with
- sparse features and transformations), a deep model (feed-forward neural network
- with an embedding layer and several hidden layers), and a Wide & Deep model
- (joint training of both). At a high level, there are only 3 steps to configure a
- wide, deep, or Wide & Deep model using the TF.Learn API:
- 1. Select features for the wide part: Choose the sparse base columns and
- crossed columns you want to use.
- 1. Select features for the deep part: Choose the continuous columns, the
- embedding dimension for each categorical column, and the hidden layer sizes.
- 1. Put them all together in a Wide & Deep model
- (`DNNLinearCombinedClassifier`).
- And that's it! Let's go through a simple example.
- ## Setup
- To try the code for this tutorial:
- 1. [Install TensorFlow](../../get_started/os_setup.md) if you haven't
- already.
- 2. Download [the tutorial code](
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py).
- 3. Install the pandas data analysis library. tf.learn doesn't require pandas, but it does support it, and this tutorial uses pandas. To install pandas:
- 1. Get `pip`:
- ```shell
- # Ubuntu/Linux 64-bit
- $ sudo apt-get install python-pip python-dev
- # Mac OS X
- $ sudo easy_install pip
- $ sudo easy_install --upgrade six
- ```
- 2. Use `pip` to install pandas:
- ```shell
- $ sudo pip install pandas
- ```
- If you have trouble installing pandas, consult the [instructions]
- (http://pandas.pydata.org/pandas-docs/stable/install.html) on the pandas site.
- 4. Execute the tutorial code with the following command to train the linear
- model described in this tutorial:
- ```shell
- $ python wide_n_deep_tutorial.py --model_type=wide_n_deep
- ```
- Read on to find out how this code builds its linear model.
- ## Define Base Feature Columns
- First, let's define the base categorical and continuous feature columns that
- we'll use. These base columns will be the building blocks used by both the wide
- part and the deep part of the model.
- ```python
- import tensorflow as tf
- # Categorical base columns.
- gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"])
- race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=[
- "Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
- education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
- marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
- relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
- workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
- occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
- native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
- # Continuous base columns.
- age = tf.contrib.layers.real_valued_column("age")
- age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
- education_num = tf.contrib.layers.real_valued_column("education_num")
- capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
- capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
- hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
- ```
- ## The Wide Model: Linear Model with Crossed Feature Columns
- The wide model is a linear model with a wide set of sparse and crossed feature
- columns:
- ```python
- wide_columns = [
- gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets,
- tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)),
- tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)),
- tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))]
- ```
- Wide models with crossed feature columns can memorize sparse interactions
- between features effectively. That being said, one limitation of crossed feature
- columns is that they do not generalize to feature combinations that have not
- appeared in the training data. Let's add a deep model with embeddings to fix
- that.
- ## The Deep Model: Neural Network with Embeddings
- The deep model is a feed-forward neural network, as shown in the previous
- figure. Each of the sparse, high-dimensional categorical features are first
- converted into a low-dimensional and dense real-valued vector, often referred to
- as an embedding vector. These low-dimensional dense embedding vectors are
- concatenated with the continuous features, and then fed into the hidden layers
- of a neural network in the forward pass. The embedding values are initialized
- randomly, and are trained along with all other model parameters to minimize the
- training loss. If you're interested in learning more about embeddings, check out
- the TensorFlow tutorial on [Vector Representations of Words]
- (https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/index.html), or
- [Word Embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia.
- We'll configure the embeddings for the categorical columns using
- `embedding_column`, and concatenate them with the continuous columns:
- ```python
- deep_columns = [
- tf.contrib.layers.embedding_column(workclass, dimension=8),
- tf.contrib.layers.embedding_column(education, dimension=8),
- tf.contrib.layers.embedding_column(marital_status, dimension=8),
- tf.contrib.layers.embedding_column(gender, dimension=8),
- tf.contrib.layers.embedding_column(relationship, dimension=8),
- tf.contrib.layers.embedding_column(race, dimension=8),
- tf.contrib.layers.embedding_column(native_country, dimension=8),
- tf.contrib.layers.embedding_column(occupation, dimension=8),
- age, education_num, capital_gain, capital_loss, hours_per_week]
- ```
- The higher the `dimension` of the embedding is, the more degrees of freedom the
- model will have to learn the representations of the features. For simplicity, we
- set the dimension to 8 for all feature columns here. Empirically, a more
- informed decision for the number of dimensions is to start with a value on the
- order of $$k\log_2(n)$$ or $$k\sqrt[4]n$$, where $$n$$ is the number of unique
- features in a feature column and $$k$$ is a small constant (usually smaller than
- 10).
- Through dense embeddings, deep models can generalize better and make predictions
- on feature pairs that were previously unseen in the training data. However, it
- is difficult to learn effective low-dimensional representations for feature
- columns when the underlying interaction matrix between two feature columns is
- sparse and high-rank. In such cases, the interaction between most feature pairs
- should be zero except a few, but dense embeddings will lead to nonzero
- predictions for all feature pairs, and thus can over-generalize. On the other
- hand, linear models with crossed features can memorize these “exception rules”
- effectively with fewer model parameters.
- Now, let's see how to jointly train wide and deep models and allow them to
- complement each other’s strengths and weaknesses.
- ## Combining Wide and Deep Models into One
- The wide models and deep models are combined by summing up their final output
- log odds as the prediction, then feeding the prediction to a logistic loss
- function. All the graph definition and variable allocations have already been
- handled for you under the hood, so you simply need to create a
- `DNNLinearCombinedClassifier`:
- ```python
- import tempfile
- model_dir = tempfile.mkdtemp()
- m = tf.contrib.learn.DNNLinearCombinedClassifier(
- model_dir=model_dir,
- linear_feature_columns=wide_columns,
- dnn_feature_columns=deep_columns,
- dnn_hidden_units=[100, 50])
- ```
- ## Training and Evaluating The Model
- Before we train the model, let's read in the Census dataset as we did in the
- [TensorFlow Linear Model tutorial](../wide/). The code for
- input data processing is provided here again for your convenience:
- ```python
- import pandas as pd
- import urllib
- # Define the column names for the data sets.
- COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
- "marital_status", "occupation", "relationship", "race", "gender",
- "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
- LABEL_COLUMN = 'label'
- CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation",
- "relationship", "race", "gender", "native_country"]
- CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss",
- "hours_per_week"]
- # Download the training and test data to temporary files.
- # Alternatively, you can download them yourself and change train_file and
- # test_file to your own paths.
- train_file = tempfile.NamedTemporaryFile()
- test_file = tempfile.NamedTemporaryFile()
- urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
- urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
- # Read the training and test data sets into Pandas dataframe.
- df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
- df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
- df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
- df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
- def input_fn(df):
- # Creates a dictionary mapping from each continuous feature column name (k) to
- # the values of that column stored in a constant Tensor.
- continuous_cols = {k: tf.constant(df[k].values)
- for k in CONTINUOUS_COLUMNS}
- # Creates a dictionary mapping from each categorical feature column name (k)
- # to the values of that column stored in a tf.SparseTensor.
- categorical_cols = {k: tf.SparseTensor(
- indices=[[i, 0] for i in range(df[k].size)],
- values=df[k].values,
- shape=[df[k].size, 1])
- for k in CATEGORICAL_COLUMNS}
- # Merges the two dictionaries into one.
- feature_cols = dict(continuous_cols.items() + categorical_cols.items())
- # Converts the label column into a constant Tensor.
- label = tf.constant(df[LABEL_COLUMN].values)
- # Returns the feature columns and the label.
- return feature_cols, label
- def train_input_fn():
- return input_fn(df_train)
- def eval_input_fn():
- return input_fn(df_test)
- ```
- After reading in the data, you can train and evaluate the model:
- ```python
- m.fit(input_fn=train_input_fn, steps=200)
- results = m.evaluate(input_fn=eval_input_fn, steps=1)
- for key in sorted(results):
- print "%s: %s" % (key, results[key])
- ```
- The first line of the output should be something like `accuracy: 0.84429705`. We
- can see that the accuracy was improved from about 83.6% using a wide-only linear
- model to about 84.4% using a Wide & Deep model. If you'd like to see a working
- end-to-end example, you can download our [example code]
- (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py).
- Note that this tutorial is just a quick example on a small dataset to get you
- familiar with the API. Wide & Deep Learning will be even more powerful if you
- try it on a large dataset with many sparse feature columns that have a large
- number of possible feature values. Again, feel free to take a look at our
- [research paper](http://arxiv.org/abs/1606.07792) for more ideas about how to
- apply Wide & Deep Learning in real-world large-scale maching learning problems.