Keeping Data Inclusivity Without Diluting your Results

Analytics, data analytics, data science, Machine Learning, Predictive Analytics
2777 Views

4 years ago
Keeping Data Inclusivity Without Diluting your Results

By: Heather Krause, WeAllCount.com

Originally published in WeAllCount.com, January 17, 2020

Let’s say you are surveying 100 people out of 10,000. You want to analyze the data from your sample of 100 to get answers about the likely behaviors and preferences of the overall 10,000 person population.

Part of your project focuses on equity among sexual orientations. You don’t want to leave anyone out and you know that having a question about sexual orientation where people select ‘heterosexual or homosexual’ isn’t inclusive enough. You consult experts and the local community and decide to include ‘Heterosexual, Gay, Lesbian, Bisexual, Pan Sexual, or Asexual’ as options in that question.

Once your responses have come in, you have data from respondents across each of those categories, however only a few respondents identified as bisexual and only one person identified as pan sexual and asexual respectively. When trying to analyze the data to represent the responses of all these orientations, you realize that you have such a small amount of data from some categories that you can’t say anything statistically relevant about them, you can’t extrapolate the preferences and likely opinions about all Asexually identifying people in your population of 10,000 from one person’s data.

Rather than completely discount the categories in which you have very few responses, you decide it’s better to combine them into an amalgamated category, so that they can be better represented. When you publish your findings, you frame your results as Heterosexual, Homosexual and Other, the very thing you were trying to avoid. People are mad and hurt that they aren’t well represented and feel lumped into an ‘other’ category. Respondents who took your survey feel cheated by being asked detailed questions that you just combined anyway.

This kind of ‘collapsing’ or ‘amalgamating’ of data categories happens all the time and not just with sexual orientation. Almost all demographic questions are susceptible to being limited in the survey or condensed in the analysis; race, ethnicity, gender, language, etc. Imagine how difficult and how statistically useless it would be to list all possible spoken languages as an option on a survey. How can we be inclusive without making minority categories so small that only the majority data has statistical relevance?

Competing Priorities:

It’s important that the diversity among your respondents is given respect.
It’s important that the results you show be statistically meaningful.

To continue reading this article, click here.

EXCLUSIVE HIGHLIGHTS

Related

4 years ago
Keeping Data Inclusivity Without Diluting your Results

Originally published in WeAllCount.com, January 17, 2020

2 thoughts on “Keeping Data Inclusivity Without Diluting your Results”

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

4 years agoKeeping Data Inclusivity Without Diluting your Results

Originally published in WeAllCount.com, January 17, 2020

Recommended

Large language models use a surprisingly simple mechanism to retrieve some stored knowledge

Apple researchers develop AI that can ‘see’ and understand screen context

A.I. Is Spying on the Food We Throw Away

This new forecasting model is better than machine learning, researchers say

2 thoughts on “Keeping Data Inclusivity Without Diluting your Results”

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

4 years ago
Keeping Data Inclusivity Without Diluting your Results

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact