Why a More People-Centered Approach to Data Science Can Help Societies Thrive

By David Bray

“It was the best of times, it was the worst of times…” So begins Charles Dickens’ A Tale of Two Cites— with the narrator later noting that this characterization of contrasting superlatives represents a common thread throughout history. Nowadays, the same characterization could be said about societies debating the benefits and drawbacks of two important data science trends:

1. We are instrumenting the planet in a way that’s unprecedented for human species. With an increasing number of industrial and commercial Internet of Things devices, as well as an increasing number of low-earth orbit (LEO) small and cube satellites, we will have sensors scattered around the planet. This translates into an unprecedented capability for individuals and organizations to access raw data associated with activities going on around the world.

2. We are producing ever-increasing volumes of digital data on the Internet. IDC’s “Data Age 2025” white paper estimates 175 zettabytes (one zettabyte is a trillion gigabytes) of worldwide data by 2025, compared to an estimated 33 zettabytes on the planet as of 2018. That’s a significant increase.

These two trends represent an unprecedented scale of data-related changes for our world. These trends also collide with the reality that for most of our history as a species, humans lived in nomadic groups and everyone in that nomadic group both knew each other and knew what was going on in that nomadic group. Now, there are an estimated 7.6 billion people living on Earth, potentially more than 8 billion by 2025, with potential, in terms of data science, to know nearly all that is occurring for the entire planet on an on-going basis.

Challenges will abound. The volume and immediacy of data will challenge pulling together the right data sets to make sense of new opportunities to pursue and new threats to avoid. The abstracted nature of global data will challenge sense-making of different contexts, exacerbated again by the reality that for most of human evolution, immediate visual and auditory environments signaled opportunities and risks. We will be challenged by an environment both physically and digitally distant than the selection pressures of either three thousand or thirty thousand years ago. Future historians may conclude modern human brains weren’t ready for the sheer scale of data that we will produce in the next decade.

Addressing such challenges will presumably include use of semi-autonomous and autonomous methods of making sense of the data, including the “third wave” of artificial intelligence (AI) embodied in deep learning advances. Such data techniques spur important questions about how to ensure organizations employing such efforts effectively govern such endeavors appropriately.

Why Our Rapidly Changing World Challenges Social Foundations

While lots of discussions have occurred about the ethics of AI, these well-intended conversations often miss two important elements:

First, AI ethics depends on a solid data ethics foundation; to date, data ethics have not received as much emphasis so far.

Second, ethics are socially defined, meaning ethics are normative and represent what the collective believes to be “right.” This changes over time. In contrast, morals are internally defined. Morals are what a person individually believes is right to do.

The depths of human conflict between social ethics and individual morals have been plumbed by more than 3,000 years of human philosophy. To think that we modern humans will reach consensus on a global set of data science ethics by 2025 is probably unrealistic. However, we humans can consider where we do, as organizations, communities, and societies, want to go for the decade ahead.

Through shared narratives, the enforcement of laws, and through use of technologies, humans have shaped social norms and reshaped how power—defined as the capability to compel or oblige someone to take a certain course of action—has been distributed in our communities. Now with the beginning of the 21st century, we are facing big questions of “Quo Vadis?”— where do we want communities and human societies to go, especially given the recent rate of new data science capabilities challenging the distribution of power? There is both huge opportunity for improving our communities with more people-centered approaches, as well as significant challenges where our digital future may not be as hopeful as we would like it to be.

For open, pluralistic societies that separate their private sector from their public sector, at least three potential doors present themselves as possible destinations.

Door number one includes employing the increasing instrumentation of the planet to create a state of always-on surveillance, a modern panopticon that protects stability and safety by always monitoring where everyone is and what they are doing.

Door number two includes much the same as the first door, with a market-based twist that employs the growing volumes of data and global instrumentation to create industries fueled by what some call surveillance capitalism—producing value by creating tailored insights about preferences, intentions, and how best to influence people based on their digital data.

Both doors pose challenges for open, pluralistic societies which require collective belief in the freedom to choose and free will. For either door number one or two, societies receive increased safety, stability, or market value at the expense of erosion in choices and free will.

To some degree, open, pluralistic societies have always tolerated some influence on free choice, to include marketing advertisements or political rhetoric. Both activities employ human biases to influence behaviors.

As one example, all humans have confirmation biases. Once we accept a belief, our brains discount additional data sets counter to our beliefs and overly inflate data sets reinforcing our beliefs. Now we have disparate data sets bombarding us daily and confirmation bias makes us less open to considering additional data.

As a second example, all humans also encounter cognitive ease. The more something is repeated, the more we are likely to believe it because it seems familiar, even if that’s not necessarily what is true. Even saying “X is not true” repeats the element X in the minds of people and introduces the risk of cognitive ease swaying our beliefs.

As a third example, any data set, taken out of context, can be misinterpreted or misconstrued. We now face a world of growing data abundance at the same time as social forces have flattened historical gatekeepers who previously provided contexts. In the past, there were only a handful of radio or television outlets to convey data and narratives. While individuals may not have agreed with the contexts that the limited number of gatekeepers put out, they at least served as the conduits for data and narratives. Now the challenge is anyone can digitally “print” and view whatever they want—raising important questions about data sets and narratives taken out of context.

We would like to encourage a third door for open, pluralistic societies to consider for the decade ahead. This door will take hard work and investment, both in terms of developing new data science techniques and work to improve understanding of what’s possible between the interplay of data science, laws, narratives, and new technologies

Door Number Three: Signposts Towards a Human Noosphere

The idea of a human noosphere—a global collective consciousness on the planet or interconnected ‘mind space’—arose with the second decade of the 20th century. The idea arose first from geologists, who suggested there were three phases of life on Earth, starting first with inanimate matter (the geosphere), then the arrival of animated life (the biosphere), and an ultimate phase where humans transcend their individual thoughts of self, internal motivations, and thought to achieve a collective consciousness that surpasses ourselves (the noosphere). Some philosophers, notably Teilhard de Chardin, suggested that evolution’s natural selection tended towards increasing complexity of lifeforms and consciousness among lifeforms.

During the latter part of the 20th century there were online chat room discussions expressing hopes that the World Wide Web could help achieve such a vision of global consciousness or noosphere. A lot of the idealism for the Web included this hope for the future. Yet looking back at the last few recent decades we now see some of the signposts along the way that include some cautionary signs towards such an ideal. In attempting to work towards greater human, global consciousness we’ve discovered that our human natures, both as individuals and as collective organizations, introduce speed bumps to such a vision. We also have seen recent cases where the Internet and related technologies may be creating more of a homogeneity of thought — producing echo chambers online or (worse) surveillance states with either direct or indirect pressures for the conforming of thought, behavior, and public shaming of those who act or see differently. In considering either potential future, neither a highly polarized world full of acrimonious thoughts on the Internet or a homogeneous, highly restrictive world only containing conforming thoughts sounds like hopeful aspirations for 2030.

Europe’s General Data Protection Regulation (GDPR) has received a lot of praise for its intended outcomes and has also raised questions about whether the United States or other nations will do something similar or different. Questions about long-term impacts on Europe also have arisen. For the United States, the terms and conditions associated with online services mostly come with voluminous “terms and conditions” full of legalize that a majority of individuals don’t read through fully before they click accept. Even for those who do read through them fully, individuals receive only a binary proposition of “accept or reject” what the conditions offer.

As alternative or to the current voluminous “terms and conditions” provided for online services, I’d like to suggest is a simple two-by-two table. This simple table is intended to take up no more than half-a-page, where entities—to include corporations, startups, communities, NGOs, and more— that provide services in the world can provide short bullets of four important elements; namely:

Obligations in this Context: what principles the entity believes about its relationship with its stakeholdersAcknowledgments in this Context: what “known unknowns” may exist tied to transactions and relationshipsResponses to Obligations: what the entity will base on expressed Obligations

Safeguards to Acknowledgements: what the entity will do based on expressed Acknowledgements

This “OARS” framework seeks to provide a visible signal that, while different for each individual or organization, both informs others of the intentions of an entity and encourages that entity to consider their shared connections to a larger, people-centered community.

Imagine if the public started to expect it could find a short, concise 2×2 table showing this for every website and app? This OARS framework any entity to acknowledge some perceived biases exist for any human endeavor because of our experiences, training, background, and more. For example, an organization sponsored by a certain group may receive subtle nudges by that sponsor and should acknowledge that sponsor. an engineering firm will probably be great at engineer efforts yet may not necessarily see other perspectives outside their expertise. There may also simply be acknowledging that the organization will “do their best” when it comes to an endeavor, yet for several endeavors there will still be unknown factors that impact its delivery.

Achieving this door number three—that is neither surveillance states nor surveillance capitalism—will require the data science community to work across multiple fields to help both private and public organizations identify what obligations matter most. These obligations could include responsibilities to keep health or proprietary data confidential unless consent is given. Obligations could also include responsibilities to provide either transparency on what data analytics are performed or how data sets are employed.

The data science community will also need to work with private and public organizations to educate all members of society about human perceptual biases. This acknowledgment may include developing analytic “data mirrors” that help each of us know more about our biases innately present with all our decisions over time.

From this foundation, both individuals and organizations operating in open, pluralistic societies can make intentional choices regarding what responses to pursue as part of their data-illuminated obligations to members of the public, customers, shareholders, employees, boards of directors or other groups.

Individuals and organizations can also make intentional choices about what safeguards they deem necessary to make sure data sets are not misused in the digital future ahead. Such safeguards could include ombudsman-like functions responsible for ensuring data sets are both sufficiently diverse and sufficiently representative enough of different populations to ensure whatever deep learning occurs does not result in overt biases toward or away from different groups in society. Safeguards could include not letting emotionally laden headlines or articles that reinforce our existing biases be the sole motivator of our actions. Safeguards could also include ensuring whatever correlations or conclusions data analytics reveal, individuals as well as organizations, still have a chance to reflect upon, and consciously choose, whether what the data shows represents a direction or action that the purposeful entity is willing to pursue.

As the last few decades have shown, for almost any tool or technology, there will be unintended uses both helpful and harmful. In an increasingly connected world, we need more rapid mechanisms to identify third-order or fourth-order unintended uses and adjust appropriately. As such, this updated OARS framework asks any entity to think about what safeguards it might do should a well-intended service being provided started to be used in third-order or fourth-order unintended ways not intended. For example, an organization might perceive an unintended use of their services would be use of online advertisements to trigger violent radicalization of certain groups to harm others. In this example, a potential safeguard that could listed is an “ombuds” group where early identification of such concerns can be shared and the organization rapidly can learn, adjust, and respond accordingly.

Only by such an approach—linking data science to reflect human Obligations, Acknowledged biases, Responses to obligations, and Safeguards relative to potential biases (OARS)—can open, pluralistic societies thrive in the era of growing data abundance.

Here’s to working to help make a more positive, inclusive future for all.

 Audio Version

Female Full Audio Magazine

Female Voice Audio