Algorithmic Bias: What’s Missing from Big Data?

Written by Synaptiq | May 15, 2022 4:06:00 PM

Since the advent of written language around 3000 B.C., humans have been exchanging information across people, places, and through generations. Oral traditions imply that this practice might be even older, dating back over 10,000 years. Throughout history, the methods we've used to share information have evolved, becoming more sophisticated, accurate, and efficient, transitioning from clay tablets to today's information technology.

Information technology stands out as particularly significant in the context of evolutionary development, almost like a superpower. The natural limits of lifespan and cognitive capacity impose a ceiling on how much information any species can learn and process within a single lifetime. While most nonhuman species are bound by these limitations, there is evidence that some animals can gradually enhance their knowledge and performance over generations. For instance, pigeons have demonstrated the ability to share knowledge about optimal flight routes.

However, when compared to human capabilities in information sharing, these examples pale in comparison. No nonhuman species has achieved anything close to the level of complexity and efficiency in information exchange that humans have, primarily through the development and use of information technology. This unique ability of humans to transcend natural learning and memory limitations sets us apart in the natural world.

The development of written language and oral traditions enabled humans to collectively create, process, store, and retrieve information. Despite our ability to share knowledge, we are still constrained by human limitations:

Sensory range
Processing speed
Memory capacity

Information technology enables us to surpass these limitations, allowing us to handle vast amounts of data effortlessly. From search engine optimization to social media algorithms, it plays a transformative role in our daily lives. However, the reliance on Big Data—characterized by its enormous volume—brings its own set of challenges.

Big Data and Inherent Biases

While Big Data can handle immense volumes and varieties of data at incredible speeds, it also introduces the risk of automatic bias. This bias arises when algorithms, influenced by the prejudices of their human developers, reinforce existing societal inequities related to socioeconomic status, gender, race, and other identities.

Our data collection systems, far from being neutral, often reflect these biases. A significant example is the "sex and gender gap" highlighted by Caroline Criado Pérez in her book, "Invisible Women." This gap stems from the implicit assumption that male data is the standard, leading to an overrepresentation of male data and an underrepresentation or misrepresentation of female data.

This bias in data and algorithms can have real-world impacts, as demonstrated by the BBC’s 2020 revelation of gender bias in the design of "unisex" personal protective equipment. Such biases show that Big Data is not always representative of all data, especially that of marginalized groups.

In the Information Age, it is crucial to critically evaluate and address these biases in Big Data. We must continually ask and seek to answer the question: What's Missing from Big Data? This question is vital for ensuring that the information technology that shapes our lives and society is equitable and representative of all.

About Synaptiq

Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation.

You can learn more about our story through our past projects, blog, or podcast.

View full post