For reasons perhaps having to do with “siloed thinking” or “not-invented-here” syndrome (things that can affect all teams from time to time), many participants on the Big Data Management (BDM) and Big Data Analysis (BDA) scene have become convinced that there is only one kind of Data Science: that which is run in their corporate or academic shops. You know. Data Science. The only Data Science.
This misapprehension has given way to confusion, even among those calling ourselves data scientists, about what to call ourselves. We learned about data and analysis in a graduate (MSc or PhD, or both) physics or economics or biostatistics or Ops Research, or…some other STEM, or life, or historical, or social science (the list goes on…). Some of us have recently ended up calling ourselves data scientists because that was the direction market forces were pushing the people who make up names for occupations. Can’t get tagged on a recruiter’s search engine if you don’t have “data scientist” on your resume, right? The main effect of this positive feedback loop of labor market behavior on labor markets is to broaden the definitions of data science developed over the last decade.
Others may quail at the thought of all the time wasted—again—debating what to best call ourselves, but I think the current flux is a good thing. There are just so many kinds of data sciences and scientists. That’s the main point. But you say, again, what about Data Science, you know, where all the CS and IT nerds are? The only Data Science?
Well, it’s one branch of The Data Sciences (plural and capitalized, for the sake of argument). It could be called, quite correctly, Computer Science (CS)-IT Data Science.
But it’s not the only Data Science. I argue that to understand the impact of data and analysis in the last 25 years demands we broaden our focus from the branch of data science in which CS and IT thinking, training, and experience dominates the praxis, to the entire domain of data and the myriad of sciences that use computerized tools to analyze it.
From this point of view, there are hundreds of Data Sciences. These are systematic investigative efforts that collect and analyze data to solve problems for society and in the service of advancing an empirical and theoretical mission. (And I digress, but not by much; you need as much scientific methodology as you can possibly afford. That means properly trained people. Otherwise, you are throwing money away.)
Under the Data Sciences, then, we have, for example, Biological and Biostatistical Data Sciences, Demo-Geographic Data Sciences, Sociological Data Sciences (sociometrics), Psychological (psychometric) Data Sciences, Statistical (mathematical) Data Sciences, even Historical Data Sciences (anthropology, paleontology, archaeology, “natural history” or evolutionary biology). And so on, in growing numbers, and in no particular order or hierarchy.
These are but a few examples. I didn’t mention econometrics, an applied statistical-disciplinary bias of my own. In full disclosure, I do so mention it. And I’ll stop capitalizing now the point is made.
This is a proposal for one way to sort out some of the occupational confusion I referred to earlier. The reader should understand that the categories raised are map boundaries, not the kind of 20th century scientific disciplinary barriers that are often so unhelpful. You can think of them as heuristic lines of demarcation we use to better understand the trajectories of 21st century sciences.
And CS-IT data science’s status is unchanged. It is right in the middle of everything hot that is tech. That means many elements of it will remain indispensable. The most obvious example is its integral involvement in AI. On a Ted-x talk I heard that AI won’t run without Big Data, and Big Data is not interpretable without AI. So Big Data will have to be muscled around on both ends by the CS-IT data scientists. And they will always provide the computational engines for Data Scientists from any given discipline, i.e., advancing data management and analytic technology in the service of storing, maintaining, and retrieving data that has been observationally or experimentally collected for some purpose (or none at all.)
But, again, it is not the only Data Science. Clearly, it is one of many.
We need—all of us who work with data—to better understand that the rubric encompasses an often-bewildering thicket of beyond rapid, technologically driven changes in science, industry, labor markets, and ultimately, the behavior of consumer markets themselves. This line of thinking approaches a consistent nomenclature about what each of us does in data science, now and in the future.
About the Author:
Bill Luker Jr., PhD, is an economist, applied econometrician, and methodologist, with 30+ years of analytic experience in private industry, US and state government, academia, and community and labor organizations. Most recently, he was a Senior Vice President at Citibank, overseeing Documentation for CCAR Modeling in fulfillment of the Bank’s regulatory obligations to the US Federal Reserve under the Dodd-Frank Act. In 2017, he founded a boutique economics and statistics consultancy, terraNovum Solutions. He co-authored Signal From Noise, Information From Data (https://tinyurl.com/yccjqyo9), an introductory statistics textbook for business and government professionals, and 30 more academic and professional publications to his credit.