Predictive Analytics Times
Predictive Analytics Times
EXCLUSIVE HIGHLIGHTS
Prediction in the Public Sector: Why the Government Need Predictive Analytics
 Originally published by Analytics Magazine This...
Analytics in the Brave New Customer Experience World
 Mobile marketing technology offers opportunities to...
Wise Practitioner – Predictive Analytics Interview Series: Tauseef Rahman at Mercer
 In anticipation of his upcoming conference...
Why Your Analytics Must Ask the Data “Good” Questions — Ones that Reduce Data
 The problem of monetizing Big Data,...
SHARE THIS:

1 week ago
An Introduction to Deep Learning for Tabular Data

 

By: Rachel Thomas

Originally published in fast.ai, April 29, 2018

There is a powerful technique that is winning Kaggle competitions and is widely used at Google (according to Jeff Dean), Pinterest, and Instacart, yet that many people don’t even realize is possible: the use of deep learning for tabular data, and in particular, the creation of embeddings for categorical variables.

Despite what you may have heard, you can use deep learning for the type of data you might keep in a SQL database, a Pandas DataFrame, or an Excel spreadsheet (including time-series data). I will refer to this as tabular data, although it can also be known as relational data, structured data, or other terms (see my twitter poll and comments for more discussion).

From the Pinterest blog post ‘Applying deep learning to Related Pins’

Tabular data is the most commonly used type of data in industry, but deep learning on tabular data receives far less attention than deep learning for computer vision and natural language processing. This post covers some key concepts from applying neural networks to tabular data, in particular the idea of creating embeddings for categorical variables, and highlights 2 relevant modules of the fastai library:

  • fastai.structured: this module works with Pandas DataFrames, is not dependent on PyTorch, and can be used separately from the rest of the fastai library to process and work with tabular data.
  • fastai.column_data: this module also works with Pandas DataFrames, and provides methods to convert DataFrames (with both continuous and categorical variables) into ModelData objects that can easily be used when training neural networks. It also includes an implementation for creating embeddings of categorical variables, a powerful technique I will explain below.

To continue reading this article in fast.ai, click here.

About the Author:

Rachel Thomas, fast.ai co-founder

Rachel Thomas was selected by Forbes as one of 20 Incredible Women in AI, earned her math PhD at Duke, and was an early engineer at Uber. She is a professor at the University of San Francisco and co-founder of fast.ai, which created the “Practical Deep Learning for Coders” course that over 100,000 students have taken. Rachel is a popular writer and keynote speaker. Her writing has been read by over half a million people; has been translated into Chinese, Spanish, Korean, & Portuguese; and has made the front page of Hacker News 7x.

 

Leave a Reply