‘Cyborg’, by Lynn Randolph (1989) http://www.lynnrandolph.com/

Can data ever know who we really are?

Zara Rahman
Deep Dives

--

Introduce yourself: your name, where you’re from, and three words to describe yourself.

I watch as others standing around me in the circle answer these three prompts with an ease that I’m almost envious of. I wonder: How do they prepare for this? Do they have a standard ‘three word description’ to call upon in such situations?

No, that seems unlikely. My answer, like theirs, depends on what I want to project to the group — what side of me I want them to know, how I want them to think of me, what impression I want to make.

Reducing our whole selves to a few data points is an impossible task, albeit helpful in this situation: we’re using this exercise as an icebreaker, a way of quickly getting to know others in the room. And just as much as the words themselves, the decision of what to prioritise, how to identify, and how a person chooses to make those decisions on the spot, reveals a lot about the person answering.

I am, I was, I will be. I was a violinist. I lived in the UK. I am a child of immigrants. I am an immigrant. I live in Germany. I am a German speaker. I am a writer. I live in Berlin. I will be — who knows.

Identities are fluid. Who I am at any given time depends on a confluence of factors, the context of the situation I’m in. Sometimes I might want to perform one element of my identity more than another. And that choice is, or should be, mine to decide — it changes based on who I’m with, what I want to project, what’s important to me at that time.

As I think about my last year, or my last decade, I know that both the adjectives and the nouns of my identities have changed drastically. Being able to claim those identities gives me freedom to claim who I am now, and to see clearly how those words and sentences have changed over time. And changing those answers is good, and real, and more accurate than any kind of static description ever will be.

From my diary from when I was 7: ‘Today I put my earrings in by myself, I’m a grown-up.’

***

The only lasting truth
is Change.
– Octavia Butler

For states and institutions wanting to understand their populations, almost the exact opposite to Butler’s quote is true. According to digital data — collected once and entered into a system -– who we are is static: a series of unchanging facts. We are categorised by where we live, our gender identity, our year of birth, our ethnicity or race. Adjusting those records is difficult and often incompatible with rigid systems. Essentially, being our true and fluid selves becomes impossible in the eyes of the state.

Ghanaian-American philosopher and academic Kwame Anthony Appiah calls this the ‘Medusa Syndrome’, writing that ‘what the state gazes upon, it tends to turn to stone.’ He describes this as an inadequate but somewhat inevitable strategy that the nation-state adopts; the only way a state has of making its people legible or, in other words, of ‘watching’ its population.

But watching isn’t the same as seeing.

Untitled, by Mark Pernice (2017), Image courtesy https://theintercept.com/2017/07/08/border-sheriffs-iris-surveillance-biometrics/

In the 2018 report ‘Reclaiming Our Data’, the research collective Our Data Bodies describes the desire of marginalised people from America to be ‘seen and heard, not watched’. The communities that participated in the study — including low-income residents of public hous`ing or those seeking political asylum — were from Charlotte, Detroit and Los Angeles. These communities all describe how being perpetually watched and counted led to the erosion of their human dignity. From community members being forced to interact with ‘intrusive and insecure’ data-driven systems to groups being frustrated at the ‘unrepresentative nature of data’ and how it is used to categorise them, people felt dehumanised by the state’s processes of data collection.

Systems like credit scoring or criminal records create profiles that stay with individuals, regardless of whether they’ve changed as people. These systems restrict people’s ability to demonstrate how they’ve changed, and to move past their earlier selves. Once categorised with a certain label, that label sticks, no matter how much a person’s identity or behaviour changes.

Worst of all, participation in these systems is not optional, yet they purport to make life easier. But easier for whom? Researchers from the study conclude that the systems they looked into were ‘for the sole benefit of the data collector,’ rather than for the community members themselves.

For groups whose identities have traditionally been ignored, this kind of data collection further solidifies their marginalisation, forcing them into false binaries or categories that don’t reflect their changing realities.

Untitled, by Kevin Hong (2017), Image courtesy https://www.wired.co.uk/article/chinese-government-social-credit-score-privacy-invasion

At the time of writing, the Transgender Persons (Protection of Rights) Bill is making its way through Indian parliament, demonstrating another example of unnuanced understandings of identity. Casting gender into stone, to use Appiah’s words, the bill will permit only those who have received gender-reassignment surgery to be counted as trans, for the purposes of everything from ID cards to healthcare access. In order for this ‘counting’ to take place, the bill requires all trans people to appear before a government-run ‘screening committee’ in order to have their gender identity affirmed. These identities will then be reflected in wider systems of data collection, like the Aadhaar biometric project.

Instead of respecting people’s rights to self-identification and acknowledging the fluid nature of gender and the many ways it can be expressed, the Indian state is seeking data in specific categories, irrespective of lived realities. If data on individuals is gathered in this way, it will paint a picture far removed from their realities, discriminating especially against trans people who may neither want nor be able to afford surgeries, and invisibilising intersex people by labelling them as trans.

In this way, the transgender and intersex communities are watched, but their needs are not seen.

***

‘I slept badly,’ my mother says. ‘My Fitbit says so.’

I’m learning that arguing with data is harder than arguing with people. Machines don’t lie, because, well, why would they? They’re just machines, programmed to do one thing — and to do it well.

I know this phenomenon well. As a society, we believe in precise numbers more than we do in unquantifiable things like feelings. This is known as ‘precision bias’, where our brains confuse precision with accuracy. I saw this phenomenon play out in 2016 across the United Kingdom, where inaccurate (but precise) claims of the UK sending the European Union £350 million per week played an outsized role in convincing people to vote to leave the EU.

This number, the catchy £350 million per week, was plastered on buses driven up and down the country, in order to convince voters that leaving the EU meant saving money; money that could be better spent on the country’s own health services. What they didn’t say until the votes came in, however, was that the number was false, and that leaving the EU didn’t mean any extra support for flailing health services.

The day of that fateful referendum, I stood on the streets of Manchester and tried to convince undecided voters that we were stronger together, not apart. Multiple people told me that they wanted to give that £350 million to the country’s National Health Service, instead of to the European Union. And despite my best efforts at telling them that those numbers had been made up, their minds were already made up.

Those numbers — or rather, that one data point of £350 million — stuck in the minds of so many people. It was more effective than numerous nuanced arguments about the benefits of the EU, and led voters to make a decision that millions regret.

What has led us to this moment in history, where we value data over people? The digitisation of our society, of our systems, of our lifestyles, has led us down a path of putting more faith in our technology than we do in those around us, even when those figures tell us something we don’t quite believe.

In my mother’s case, the consequences can only be seen on a micro level. She feels like she didn’t sleep well, because she sees on her Fitbit app that she was only in ‘deep’ sleep for one hour. It doesn’t matter how much she ‘should’ be in deep sleep per night — that number seems low, and she believes it should be higher. Whether she feels rested or not is secondary, and is influenced by her having seen that data. She trusts the machine on her wrist more than she trusts herself.

#modernmedusa (2018), Image courtesy https://www.instagram.com/p/Bp-vR-SHIOE/?utm_source=ig_share_sheet&igshid=13st930my4386

This becomes even more complex when we take into account the decisions that have been made around how Fitbit data is collected, and who those decisions exclude.

The first step in setting up a Fitbit is making your profile, where the options for your gender are binary: ‘male’ or ‘female’. But this is a problem for many people. Discussions on Fitbit public forums include multiple feature requests for gender to be defined more accurately, with one user even putting in the labour of debunking in detail misconceptions about the relationship between chromosomes and hormones, as well as explaining the social fluidity of gender.

Further discussions outline how unhelpful the binary ‘male/female’ distinction is, because it leads to inaccurate data for people across the gender spectrum by making false assumptions about their bodies, and then extrapolating those assumptions to assess data like calorie burn. Beyond the immediate implications this has for trans or intersex individuals who will receive inaccurate data about their health or wellbeing, these inaccuracies also affect any overall data analysis, undoubtedly skewing all the findings of what the company calls ‘Fitscience’.

Without more realistic data collection options, the potential value of spotting useful patterns is drastically reduced. Everyone’s data is negatively affected — most of all, the transgender and intersex people who need to have the option to accurately describe their bodies in a way that helps, not hinders, them.

***

But digitisation, and more broadly, the spread of digital technologies, can’t be all bad. In fact, surely access to digital technologies is good, since it allows people to get access to information and services, and to do so much more, much quicker, and at a much larger scale than we ever could have imagined before. Connecting people is good, too (or so Facebook would like us to think). Making it quick and easy for people to communicate, for people to write to each other, share photos and thoughts online — that’s all progress in the ‘Fourth Industrial Revolution’.

But as we build our dependence upon easy and convenient digital technologies, I wonder: Can there be such a thing as ‘too easy’ technology? Too little complexity, and too much simplification? Is it too easy for my mother to see her heart rate, to be told how ‘well’ she slept? Faster is better, faster is also different.

But getting what we want as quickly as possible — that’s progress. Isn’t it?

There’s a part of me that longs nostalgically for the days when my de-facto way of answering a question would be to go to the 12-volume Encyclopaedia Britannica lined up on our family’s bookshelf, to pull down the book with the entry that I guessed would have the answer, and to flick through the pages in order to slowly, gradually, find my answer. In this way, I came across all sorts of answers to questions I never knew I had. Searching for the difference between a forest and a wood led me to read about the field of forensics; searching for information about my family’s home in Bangladesh led me to the same page as the entry on Dakar, the capital of Senegal (and how, of course, Dhaka or ‘Dacca’ as it was spelt in the Encyclopaedia, is a distinct and different city.)

‘Max in the Stacks’, by Charles Wysocki, Image courtesy https://www.art.com/products/p12176799-sa-i1548965/charles-wysocki-max-in-the-stacks.htm

That information was stuck in a particular moment of time, and in a particular context. When I learned that Bonn was the capital of West Germany and Berlin of East Germany, the Berlin Wall had already fallen. Thinking back now, I realise that the history of India I learned was clearly written by the colonisers, not the colonised.

Of course, I was privileged to have such easy access to books full of knowledge and facts waiting to be discovered, even with those information biases. It’s undeniable that the increase in digital technologies has had a positive effect in access to information, even if the joy of discovering that information has somewhat disappeared for me. Now, I don’t experience anything close to that feeling of joy over an accurately-delivered search engine response, because I’m so accustomed to it; it happens too easily. Uncovering something that is harder to find, that isn’t served up in the first page of a search engine’s results — that’s still exciting.

But the bias of what information is most heard — the bias that led me to read more about Elizabeth I than about the destructive nature of British colonialism — remains online too (despite the early neoliberal hopes of the web being a place where ‘where anyone, anywhere may express his or her beliefs’). As Astra Taylor writes in her book The People’s Platform, ‘While it’s true that anyone with an Internet connection can speak online, that doesn’t mean our megaphones blast our messages at the same volume. Online, some speak louder than others.’

Hopes of the internet being a place where all voices could be heard equally have already been quelled, and even the internet’s de-facto encyclopedia, Wikipedia, shows us certain already-dominant perspectives over others. As Wikimedia Foundation Executive Director Katherine Maher writes, ‘We’ve got comprehensive coverage on [US] college football but significantly less on African marathoners.’ She mentions pages that were classified by Wikipedians as ‘low importance’, like the page on breastfeeding, as an example of the bias that appears when the vast majority of page creators are from one specific identity location — that is, cisgendered men. Not to mention the biases that arise from different interpretations of the ‘notability’ that one needs to have in order to get a page — for example, Nobel Prize winner Donna Strickland didn’t have a Wikipedia page until the day of her award, not for want of people trying, but for an editor declining the page because she hadn’t received ‘significant coverage’.

But there’s a difference in my expectation: when I was a child, I knew that the pages of my encyclopaedia were static, and perhaps out of date. When I search online, my expectation is that the information I am shown will be the latest and most up-to-date information out there, served up in the time it takes to blink an eye.

The speed I’ve come to expect from the digital systems around me has multiple destructive consequences. Systems which optimise for speed do so at a cost. They sometimes reduce focus on security, making it easier for malicious actors to progress through a system. And they often come at a huge cost to the environment, by treating our energy resources as if they were infinite. What this speed means for our multifaceted histories is that only certain dominant perspectives are heard. Our histories, like our own narratives, are complex, and it’s hard to see how our current fast and frictionless approach could ever represent that complexity.

‘White Wave’, by Caroline Vis, Image courtesy https://www.saatchiart.com/art/Painting-White-Wave-Oil-Painting-dripping-pouring-jackson-pollock-style/855129/2891269/view

That priority of speed and quantification is also reflected in what data we collect — big data, instead of what ethnographer Tricia Wang calls ‘thick data’. Wang defines thick data as ‘data brought to light using qualitative, ethnographic research methods that uncover people’s emotions, stories, and models of their world.’

Gathering thick data takes far longer; it means gaining a deep understanding of a smaller group of the population, instead of being able to quantify a large group of people. If we used thick data in addition to big data, we would be able to combine context and numbers in order to understand the world better.

In this alternative approach that values qualitative as well as quantitative data, the idea of ‘significant media coverage’ would matter less in assessing someone’s notability than understanding their position in their field: whether they’re perceived as a significant person and whether they’ve made significant contributions — regardless of whether those contributions have been covered by the media.

***

If I didn’t define myself for myself, I would be crunched into other people’s fantasies for me and eaten alive.
Audre Lorde

The proliferation of digital data and the technologies that allow us to gather that data can be used in another way too — to allow us to define for ourselves who we are, and what we are.

Amidst a growing political climate of fear, mistrust and competition for resources, activists and advocates working in areas that are stigmatised within their societies often need data to ‘prove’ that what they are working on matters. One way of doing this is by gathering data through crowdsourcing. Crowdsourced data isn’t ‘representative’, as statisticians say, but gathering data through unofficial means can be a valuable asset for advocates. For example, data collating the experiences of women who have reported incidents of sexual violence to the police in India, can then be used to advocate for better police responses, and to inform women of their rights. Deservedly or not, quantifiable data takes precedence over personal histories and lived experience in getting the much-desired currency of attention.

And used right, quantifiable data — whether it’s crowdsourced or not — can also be a powerful tool for advocates. Now, we can use quantifiable data to prove beyond a question of a doubt that disabled people, queer people, people from lower castes, face intersecting discrimination, prejudice, and systemic injustices in their lives. It’s an unnecessary repetition in a way, because anybody from those communities could have told reams upon reams of stories about discrimination — all without any need for counting.

Regardless, to play within this increasingly digitised system, we need to repeat what we’ve been saying in a new, digitally-legible way. And to do that, we need to collect data from people who have often only ever been de-humanised as data subjects.

Untitled, by Kevin Hong (2017), Image courtesy https://www.wired.co.uk/article/chinese-government-social-credit-score-privacy-invasion

Artist and educator Mimi Onuoha writes about the challenges that arise while collecting such data, from acknowledging the humans behind that collection to understanding that missing data points might tell just as much of a story as the data that has been collected. She outlines how digital data means that we have to (intentionally or not) make certain choices about what we value. And the collection of this data means making human choices solid, and often (though not always) making these choices illegible to others.

We speak of black boxes when it comes to the mystery choices that algorithms make, but the same could be said of the many human decisions that are made in categorising data too, whether that be choosing to limit the gender drop-down field to just ‘male/female’ as with Fitbits, or a variety of apps incorrectly assuming that all people who menstruate also want to know about their ‘fertile window’. In large systems with many humans and machines at work, we have no way of interrogating why a category was merged or not, of understanding why certain anomalies were ignored rather than incorporated, or of questioning why certain assumptions were made.

The only thing we can do is to acknowledge these limitations, and try to use those very systems to our advantage, building our own alternatives or workarounds, collecting our own data, and using the data that is out there to tell the stories that matter to us.

***

In many ways, digital data is a simplification of reality, a ‘stone representation’ of a complex life. Taking this one step further: perhaps digitisation, or digital data, isn’t always the answer. Narrative histories tell us far more than a digitised family tree ever could. The feelings that are communicated during a great oral story can never be reduced to machine-readable data. The results of a heritage DNA test cannot reflect the life experience and history of a person — and even those results are the consequence of scientists’ preconceptions about gender and race, combined with the data they had available to learn from, codified into a digital system.

There are limits to categorisation and to digitisation, and some of those limits should act as hard stop signs for us. Digitisation is not always progress — sometimes, it’s a veneer for political systems wanting to categorise us for easier surveillance. Or an excuse that permits us to over-simplify or ignore the complexities and nuances in our lives and in our understanding of others. Reducing ourselves to binary identities, to pre-written answers in drop-down menus, is helpful for those in power wanting to understand how to control populations, but not for those whose identities have always been at the margins.

Untitled, by LA Johnson (2014), Image courtesy https://www.npr.org/2014/09/30/352661280/marriage-pattern-shifts-seen-by-some-as-destabilizing-society

Progress, in the grandest sense, doesn’t have to look like this. The development of humanity doesn’t have to mean outsourcing decisions to machines, or building systems that categorise us without truly understanding who we are.

Real progress should thoughtfully move us towards a more equitable and just society, not speed towards less friction while further entrenching the inequalities of the past. Our collective reluctance to put in the work necessary to acknowledge those inequities, focusing instead on easier-to-fix criteria like speed and categorisation, is simply lazy.

In the systems of the future, we need more than just fast and frictionless — we need nuance, acknowledgement of what came before, and an aspiration to make changes that improve lives for everyone, not just the most visible few.

Sometimes, complexity is a feature, not a bug.

This work was carried out as part of the Big Data for Development (BD4D) network supported by the International Development Research Centre, Ottawa, Canada.

--

--

Writer, bookworm, data nerd | tech, social justice, power | team @engnroom, ‘16/’17 fellow @datasociety, author @globalvoices, visiting fellow @hks_digital