Machine Learning Times
Machine Learning Times
Effective Machine Learning Needs Leadership — Not AI Hype
 Originally published in BigThink, Feb 12, 2024.  Excerpted from The...
Today’s AI Won’t Radically Transform Society, But It’s Already Reshaping Business
 Originally published in Fast Company, Jan 5, 2024. Eric...
A University Curriculum Supplement to Teach a Business Framework for ML Deployment
    In 2023, as a visiting analytics professor...
The AI Playbook: Providing Important Reminders to Data Professionals
 Originally published in DATAVERSITY. This article reviews the new...

Mark Smith, president, Provenir
Originally published at wired 


It seems Big Data is currently persona non grata. In the last week, a headline in The New York Times asked “Is Big Data an Economic Big Dud?” and, based on some (perhaps mis-interpreted) sentiments in the most recent spate of Gartner Hype Cycle reports, an AllThingsD headline declared “Think Big Data Is All Hype? You’re Not Alone.”

As the AllThingsD piece notes, Gartner indicates that we are at the peak of the Big Data buzz. By default then, the next logical step is Gartner’s so-called “trough of disillusionment.” As Gartner will tell you, this cycle is completely natural for all technologies, so we are bound to start seeing more Big Data naysayers. But I for one am still positive on Big Data’s promises, and not just the economic ones.

At the outset here I declare my interest that, yes, Provenir is a company focused on leveraging Big Data effectively. But given this recent pick-up in negative sentiment I wanted to highlight the fact that both articles only scratch the surface of the real challenges around Big Data: the data scientist skills gap, our desire for instant gratification and the tendency to measure success and value purely in economic terms.

The Times‘ piece qualifies its initial question of Big Data’s economic validity by saying “Other economists believe that Big Data’s economic punch is just a few years away, as engineers trained in data manipulation make their way through college and as data-driven start-ups begin hiring.” The Big Data skills gap is nothing to make light of. Just last month, Gartner predicted that 4.4 million IT jobs will be created to support Big Data, and that, even more importantly, each of those jobs will create employment for three more people outside of IT. But they also warned that, due to a lack of properly trained data professionals, only one-third of all those jobs will actually be filled.

Our industry can solve this problem by one or a combination of two things: more educational programs that can produce data-oriented professionals at a faster rate; and better technology that doesn’t require such an advanced IT professional to operate in the first place. We may need IT professionals to get the data together and in good shape – but it is those other jobs outside of IT that need to get more skilled with using the data. Business analysts and data scientists alike need to get hands on to make the best use of big data, and tools are needed that let all skill levels do this. At Provenir, we think we have a good handle on the latter — it’s one of the reasons we launched our new customer listening division in the first place.

But the key point here is that there exists a very real and acknowledged lack of talent to manipulate Big Data for economic benefit, and it is going to take time to get the workforce oriented to this new reality. The Times‘ article compares Big Data with the industrial revolution, saying Big Data’s impact hasn’t been nearly as big. But the article fails to note that it took years to get the developed world’s workforce in gear (pun intended?) for a world filled with steam trains, textile mills and automobiles.

Moreover, the education systems that the Western world put in place to support the skills needed in the industrial revolution are (amazingly) still the basis of today’s system. Ken Robinson gave a fantastic TED Talk that describes why the education system is often so slow to respond to the needs of even current industries and trends, let alone those of the future: “Our education system has mined our minds in the way that we’ve strip-mined the earth for a particular commodity, and for the future it won’t serve us.” Perhaps a bit over the top, but the guts of this inspiring talk make it clear how we are still educating people for the old, uncreative, non-data driven world, and now that things are changing so fast we don’t even know what subjects to educate today’s kids in for the needs of jobs in 10 years’ time.

Perhaps this education gap is also indicative of the second and third reasons to not doubt Big Data’s benefits just yet: Big Data is both still a relative infant that must be allowed time to mature, and even then — why assume it will yield a 1:1 ratio of bytes to GDP dollars. The Times piece does rightly note that “some economists argue that it is often difficult to estimate the true value of new technologies, and that Big Data may already be delivering benefits that are uncounted in official economic statistics.” Since the comparison to the industrial revolution has already been made, we must remember that the benefits of such massive societal movements may not be felt for decades, and they almost certainly will not all be quantifiable.

I’m positive about Big Data for sure, but not only for how much it may end up moving GDP or stock ticker needles. It is true that one of the most widely-regarded technologies today for making Big Data more manageable — Hadoop — is free. And that, as AllThingsD somewhat reluctantly points out, Hadoop-based “startups like Cloudera, Hortonworks and MapR have all been closing big funding rounds in recent months, in no small part because their customers are moving from trying the technology out to deploying it for real. Indeed, Gartner’s primary rival, the market research firm IDC, is calling for the revenue of companies in the Hadoop business to grow by more than 10 times by 2016, and maybe even to disrupt the businesses of software giants like Oracle and Teradata in the process.”

The mistake that the The Times article makes is to expect as much economic growth as there is Internet traffic (and, possibly at a more fundamental level, in even ascribing the Big Data label to raw Internet traffic in the first place). So much of the data we transfer these days is simply us living our lives in a different way than we did before the days of Facebook, Twitter and YouTube (all of which are free services). Why should we turn all of that personal entertainment and networking into pure profit?

A primary use of the Internet, and therefore a primary creator of Big Data, is personal entertainment, information access, and personal networking. Big Data delivers improved quality of life and new capabilities for billions of people, but I don’t believe it will simply drive economic GDP in the same volumes as the data being transferred.

Also, should such GDP increases be the way we measure our success? It is certainly true that businesses can benefit economically from Big Data in a big way, and unless they start to make some credible progress in doing so more naysayers will pile on. In some sense, that is probably inevitable. But thinking Big Data’s benefits can only be measured fiscally is a “1 percent” argument. Meanwhile, the other 99 percent will benefit from Big Data every day, and if we can sort out that skills gap, many of them may also actually use it every day, too.

Mark Smith, president, Provenir
Originally published at wired

Leave a Reply