PAW Business Track 2: TECH - Machine learning methods & advanced topics
Today, with always more data at their fingertips, Machine Learning experts seem to have no shortage of opportunities to create always better models. Over and over again, research has proven that both the volume and quality of the training data is what differentiates good models from the highest performing ones.
But with an ever-increasing volume of data, and with the constant rise of data-greedy algorithms such as Deep Neural Networks, it is becoming challenging for data scientists to get the volume of labels they need at the speed they need, regardless of their budgetary and time constraints. To address this “Big Data labeling crisis”, most data labeling companies offer solutions based on semi-automation, where a machine learning algorithm predicts labels before this labeled data is sent to an annotator so that he/she can review the results and validate their accuracy.
Unfortunately, even this approach is not always realistic to implement, for example in the context of some industries, such as Healthcare, where obtaining even a single label can cost thousands of dollars.
There is a radically different approach to this problem which focuses on labeling “smarter” rather than labeling faster. Instead of labeling all of the data, it is usually possible to reach the same model accuracy by labeling just a fraction of the data, as long as the most informational rows are labeled. Active Learning allows data scientists to train their models and to build and label training sets simultaneously in order to guarantee the best results with the minimum number of labels. In this talk, I will cover both the promises and challenges of Active Learning, and explain why, all in one, Active Learning is a very promising approach to many industry problems.
VP of Machine Learning