Predictive Analytics Times
Predictive Analytics Times

2 years ago
Wise Practitioner – Text Analytics Interview Series: John Herzer and Pengchu Zhang at Sandia National Laboratories


In anticipation of their upcoming conference co-presentation, Enhancing search results relevance using Word2Vec Language Models at Text Analytics World Chicago, June 21-22, 2016, we asked Pengchu Zhang, Computer Pengchu Zhang imageScientist at Sandia National Laboratories, and John Herzer, Enterprise Search Project Lead at Sandia National Laboratories, a few questions about their work in text analytics.

Q: In your work with text analytics, what behavior or John Herzer imageoutcome do your models predict?

A: We use the Word2Vec Neural Network model in our search application to predict word usage in our corpus for a particular context.  Word2Vec consists of two models, the Continuous Bag of Words (CBOW) model and the Skip-Gram model.  The CBOW model lets us predict a target word given the surrounding words and conversely, the Skip-Gram model lets us predict the surrounding words given a specific word.  We use this capability to enhance our queries with term expansion.

Q: How does text analytics deliver value at your organization – what is one specific way in which it actively drives decisions or operations?

A: We’re able to increase the relevant content that we return in our search results by enhancing the customer’s query with related terms or synonyms.  This automated way of identifying synonym-like terms is much more cost effective than trying to build and maintain a corporate synonym dictionary.

Q: Can you describe a quantitative result, such as the predictive lift of your model or the ROI of an analytics initiative?

A: For one particular acronym query, ‘SAR’, our use of Word2Vec to expand the query into its definition of ‘synthetic aperture radar’ resulted in a 432% increase in relevant documents returned by the search engine.

Q: What surprising discovery or insight have you unearthed in your data?

A: We discovered that there were many more references to “buckyballs”, the spherical carbon molecule in our corpus than we realized.  Use of the Word2Vec model resulted in our query being expanded to include the word ‘fullerenes’, a term more commonly used in scientific papers for this molecule.

Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Text Analytics World.

A: Using the Word2Vec model for query enhancement can help your enterprise move towards conceptual search.


Don’t miss Pengchu and John’s conference co-presentation, Enhancing search results relevance using Word2Vec Language Models on Tuesday, June 21, 2016 from 3:35 to 4:20 pm, at Text Analytics World Chicago. Click here to register to attend. USE CODE PATIMES16 for 15% off current prices (excludes workshops).

By: Steven Ramirez, CEO at Beyond the Arc, and Co-Chair of Text Analytics World

Leave a Reply