Machine Learning Times
EXCLUSIVE HIGHLIGHTS
2 More Ways To Hybridize Predictive AI And Generative AI
  Originally published in Forbes Predictive AI and generative AI...
How To Overcome Predictive AI’s Everyday Failure
  Originally published in Forbes Executives know the importance of predictive...
Our Last Hope Before The AI Bubble Detonates: Taming LLMs
  Originally published in Forbes To know that we’re in...
The Agentic AI Hype Cycle Is Out Of Control — Yet Widely Normalized
  Originally published in Forbes I recently wrote about how...
SHARE THIS:

3 years ago
Getting Tabular Data from Unstructured Text with GPT-3: An Ongoing Experiment

 
Originally published by Roberto Rocha.

One of the most exciting applications of AI in journalism is the creation of structured data from unstructured text.

Government reports, legal documents, emails, memos… these are rich with content like names, organizations, dates, and prices. But to get them into a format that can be analyzed and counted, like a spreadsheet, usually involves days or weeks of tedious manual data entry.

Large language models like GPT-3 from OpenAI have the potential to greatly speed up this awful slog. Because these models have such a deep grasp of language (GPT-3 was trained on basically the entire internet — at least all of English Wikipedia), they can understand commands and pick out the right elements from text.

The challenge

The Canadian federal lobbyist registry has a lot of information about who is lobbying government officials, and on whose behalf. One of the most important elements of the registry is the past public offices data: lobbyists who previously worked for the government.

The data is pretty structured, showing the offices held and the time period. Here’s an example for a lobbyist working for TikTok.

To continue reading this article, click here.

4 thoughts on “Getting Tabular Data from Unstructured Text with GPT-3: An Ongoing Experiment

  1. Great article with excellent idea! Thank you for such a valuable article. I really appreciate for this great information run 3.

     
  2. Content such as names, organizations, dates, and prices abound in government reports, legal papers, emails, driving directions and memoranda. However, it sometimes takes days or weeks of painstaking manual data input to convert them into a format that can be evaluated and tallied, such as a spreadsheet.

     
  3. Spend some time playing. I’m interested in finding out more because I have strong views about it. Would you please provide more details to your blog post? We will all actually gain from it. run 3

     
  4. This internet web page is genuinely a walk-through for all of the info you wanted about this and didn’t know who to ask. Glimpse here, and you’ll surely discover it. Bridal Makeup