As both a speaker and a representative of the IIA, I attended the Predictive Analytics World (PAW) show in Las Vegas, June 5-6, 2018. The organizers stated that they had approximately 600 people registered – and the opening session didn’t appear far off from that. This year the show had a new format called “Mega PAW” where a number of (previously individual) PAW shows were all housed together under one roof. Some of the keynotes were shared by all attendees, while the breakouts were segmented by topic. The five shows housed together included:
Note that this article contains an excerpt from IIA’s full report. The full report covered an additional dozen sessions. All of the sessions I attended were within the business (operationalization) track of PAW Business. So, the content of the brief only reflects that track. The key points from each session attended will be provided, as well as some commentary on IIA’s thoughts about the topic. Information on the full report is provided at the end of this article.
Keynote Part 1 – Mike Tamir of Uber – Applied Deep Learning: Self-Driving Vehicles
Mike’s talk focused on two specific case studies based on deep learning. The first was self-driving cars. Mike discussed how the challenge in developing self-driving vehicles can be broken down into three primary components he called the 3 P’s:
Mike’s view is that deep learning is really about feature engineering. He started with a very simple example of a few colored points on a line. He showed how it was impossible to separate red and blue points with just a vertical line. But, with a quadratic curve, it became possible. The quadratic curve is a “feature”. As he described it, feature engineering can be thought of as an extension of classic variable transformation and creation processes used to support an analytics process.
Mike then discussed how deep learning effectively automates feature engineering and searches to find features that will effectively separate one case from another. One example would be distinguishing a cat from a non-cat. Deep learning tries a wide range of features until it finds a set that works.
Mike discussed the heavy use of convolutional neural networks in the self-driving vehicle space. When applied to images, convolutions are effectively filters that are passed over the raw pixels that pull out various angles, textures, or colors. Across a broad set of features and enough training time, the algorithms will learn to distinguish cars, pedestrians, etc. in order to allow the autonomous vehicle to navigate safely and successfully.
IIA’s Key Takeaways
From Mike’s first case study, a couple of items jumped out. First, the parallel between the fancy sounding “feature engineering” and classic variable transformation and selection. From a conceptual standpoint, the way Mike presented it made the two topics easy to link. The idea of thinking of convolutions as just filters passing over an image was also easy to grasp. He showed a few examples of filters that helped to illustrate the concept. In an unrelated discussion, it was raised that “filters” available on many cameras are exactly this type of convolution. Some filters sharpen pictures, and some soften them, for example. But there is simple math behind those formulas that operate very similarly to how a CNN operates.
Keynote Part 2 – Mike Tamir of Uber – Applied Deep Learning: Fake News
The second case study Mike discussed was the identification of fake news. In the case of fake news, the raw data isn’t pixels in an image but words in an article. The words are the logical equivalent of the pixels. Mike acknowledged the point that, of course, there is much to debate about whether we should be censoring information at all. However, if it was to be agreed that censoring should be done then it is necessary to develop algorithms that will be accurate in censoring only those articles that “should” be censored. Given that premise, he continued.
He discussed research that has shown that the more emotionally oriented content is, the more it suppresses our cognitive and logical functions. So, if we are made to viscerally and emotionally react to information, we will be less able to be rational in our evaluation of it. One potential criteria for identifying fake news would therefore be to determine if an article appears to be trying to unemotionally present facts or to be attempting to simply stir emotions. Articles are more likely to be fake if they are just stirring emotions.
Mike discussed how Recurrent Neural Networks look at data points in order and so are terrific for time series cases such as telemetry, stock markets, and sentences. Long Short Term Memory (LSTM) is an approach that builds on time series and is able to both learn what to remember but also learn what to forget. As it trains, it can learn what the most important and relevant words to look for are. It can also be applied to sentences within paragraphs or paragraphs within documents. In other words, at any level.
For the fake news analysis, all of the text is placed into a vector where each word is a dimension. Then, the neural net attempts to engineer features that distinguish factual presentation from emotion stirring within articles. Of course, to do this the model needs a lot of articles flagged as “fake” and a lot flagged as “not fake”. Mike suggested an approach leveraging the concept of the Wisdom of the Crowds that would rely on people to correctly flag the articles.
IIA’s Key Takeaways
Clearly fake news and censorship are very hot and emotional topics themselves today. The important takeaway is that the methodology Mike described could be applied more broadly than just the politically charged realm of fake news. Perhaps we want to identify which submitted blogs will be popular. Or we want to identify a person suffering from depression based on their writing. The same underlying math would be applied to text tagged “good blog / bad blog” or “depressed writer / non-depressed writer”. The key capability is the ability to train a neural net to effectively learn how to distinguish two classes of articles. This has broad applicability in many areas less controversial than fake news.
Read about more Mega PAW Vegas 2018 Sessions
This article is an excerpt from IIA’s full PAW conference report that is available to IIA clients on our website. The full report includes coverage of a dozen additional keynote and track sessions on top of the keynote session included in this article. Non-clients can request a free copy of the full PAW report by visiting this link.
About the Author
(IIA), where he provides perspective on trends in the analytics & big data space and helps clients understand how IIA can support their efforts to improve analytics performance. Franks is also the author of the books Taming The Big Data Tidal Wave and The Analytics Revolution. He is a sought after speaker and frequent blogger who has been ranked a top 10 global big data influencer. His work, including several years as Chief Analytics Officer for Teradata (NYSE: TDC), has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.