I like data. I tend to put a lot of stock in it. For scientists, data* is a key part of our lives and work. Data is how we learn, which enables us to progress.
But we have to acknowledge something very important about data. Data does not materialize out of vacuum. It doesn’t just appear on a computer drive somewhere. It’s gathered, processed, and analyzed.
Even when elements are automated, along the way humans designed the automation. They made decisions about which things to measure, how and when and where to measure them, which data points and metadata to record or retain, how to categorize or cluster data or deal with outliers…
These are human activities, meaning there are opportunities for our biases and agendas to influence the processes. Many strive to minimize the effects of bias, applying methods and strategies to design the questions, data collection processes, and analyses to get to an accurate answer in an appropriate and ethical way. They acknowledge the limitations of their approaches and data too. It can take great awareness and care to avoid biasing data.
But decisions about data collection and analysis can be driven by partisanship and chauvinism**. You don’t have to change data to manipulate it. You can modify it in the way you collect and analyze it.
The easiest is that to just stop collecting the data—or never collect it in the first place. You can’t report what you don’t measure. It can make for an incredibly useful stall tactic. We need data to know if an issue is persistent and widespread, “worthy” of investing precious time and dollars. Of course getting that data will take time and dollars.
A perhaps more sly way is to have a system in place but make the data difficult to report. You can to point to “data” and proclaim, See, no problem here! But the absence of reporting, especially for large organizations, doesn’t translate to the absence of incident.
Another approach is to change how you collect, categorize, or analyze the data. You influence who’s represented and in what way. You shift how a subset of data from one grouping to another. You modify the algorithm.
Without calling attention, you make changes (relative to how predecessors did it) to “close” the pesky gaps that groups have rallied around and worked to change. Look at the progress! Nothing more to worry about. Those groups still saying it’s an issue—well, they’re obsessed, fanatics.
I’ve been reading Susan Faludi’s Backlash: The Undeclared War Against American Women, which looks at the backlash against women’s gains and feminism through the 1980s. I’ve been struck in the latter half of the book (which focuses on backlash as seen in politics, popular psychology, women at work, and reproductive rights) by how data was used to pushback against (legitimate) claims of persisting disparities. Much of this was driven by federal offices under the control of Reagan appointees, combined with budget cuts, reducing the data collected, failing to process cases/claims (which can become points of data), or changing the way data was categorized or analyzed. I can’t help but wonder how much of this is happening now.
Today, as a society, we’ve become more engrossed in data. We collect more data perhaps than ever before. Too often I see the sentiment that data is somehow “pure”, untouched by human dispositions. Critical information and trends can be lost in the data, and we may be none the wiser if we don’t engage critically with the data, processes, and agendas that generated it.
* Yes, technically data is plural (unless talking about Data, the android). I’m intentionally using singular verbs as I’m referring to data as a single concept, not a collection of data points here.
** Chauvinism used here in its broader meaning of “undue partiality or attachment to a group or place to which one belongs or has belonged.”