By: Steven Ramirez, Conference Co-Chair Text Analytics World Chicago
In anticipation of their upcoming conference co-presentation, Enhancing search results relevance using Word2Vec Language Models at Text Analytics World Chicago, June 21-22, 2016, we asked Pengchu Zhang, Computer Scientist at Sandia National Laboratories, and John Herzer, Enterprise Search Project Lead at Sandia National Laboratories, a few questions about their work in text analytics.
Q: In your work with text analytics, what behavior or outcome do your models predict?
A: We use the Word2Vec Neural Network model in our search application to predict word usage in our corpus for a particular context. Word2Vec consists of two models, the Continuous Bag of Words (CBOW) model and the Skip-Gram model. The CBOW model lets us predict a target word given the surrounding words and conversely, the Skip-Gram model lets us predict the surrounding words given a specific word. We use this capability to enhance our queries with term expansion.
Q: How does text analytics deliver value at your organization – what is one specific way in which it actively drives decisions or operations?
A: We’re able to increase the relevant content that we return in our search results by enhancing the customer’s query with related terms or synonyms. This automated way of identifying synonym-like terms is much more cost effective than trying to build and maintain a corporate synonym dictionary.
Q: Can you describe a quantitative result, such as the predictive lift of your model or the ROI of an analytics initiative?
A: For one particular acronym query, ‘SAR’, our use of Word2Vec to expand the query into its definition of ‘synthetic aperture radar’ resulted in a 432% increase in relevant documents returned by the search engine.
Q: What surprising discovery or insight have you unearthed in your data?
A: We discovered that there were many more references to “buckyballs”, the spherical carbon molecule in our corpus than we realized. Use of the Word2Vec model resulted in our query being expanded to include the word ‘fullerenes’, a term more commonly used in scientific papers for this molecule.
Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Text Analytics World.
A: Using the Word2Vec model for query enhancement can help your enterprise move towards conceptual search.
Don't miss Pengchu and John’s conference co-presentation, Enhancing search results relevance using Word2Vec Language Models on Tuesday, June 21, 2016 from 3:35 to 4:20 pm, at Text Analytics World Chicago. Click here to register to attend.