Data Discipline: How Hackers Took on Science

(Book manuscript in progress.)

“Data scientists” have appeared in industry, government, and academia in vast numbers over the past decade. They were noted for using statistics and computer programming to make discoveries with large-scale datasets, which spurred new applications, ranging from cardiovascular or traffic monitoring to ad placements and election meddling. But data science tasks, skills, and roles remained obscure and subject to uneasy internal debates, watched by established scientists, engineers, and analysts with some skepticism. Confusion and contestation around the workings of data science has raised questions about its rapid rise to public salience. How did a loose band of nerds and hackers present mostly old and partly questionable ideas as a new science? Science or not, how could this technical work gain broad recognition?

Data science is complex, which makes its underpinnings hard to untangle. This book presents a comprehensive approach that explains data science’s emergence through historical, quantitative, qualitative, and reflexive perspectives. The historical perspective draws on a large body of writing about the development of quantitative thinking. Data science’s magnitude, technical complexity, and emergent status pose thornier challenges. The book uses a large-scale dataset to capture data science skills across industries and implements and critically unpacks a data science analysis to illustrate data science work. To capture its emergent dynamics, the book reports on observations of early data science events. These events put speakers in front of audiences without understanding this new role themselves and without the opportunity to hide their uncertainty—a situation that offers deep insights into data science’s formation.

The book’s first part locates data science in different social contexts. Data science centers on the application of quantitative expertise to a wide range of problems, including in commerce, management, medicine, and public services. This focus leads to practical concerns that have come up in academic discussions for the past two centuries, only to be declared as lacking scientific relevance and thus set aside. New technologies of the Digital Transformation have introduced similar problems into firms and other applied settings. But neither academics nor technologies or businesses created a new science; data science emerged among nerds, hackers, coders, and scholars who found their practical data work worthwhile of discussion.

The book’s second part follows those early discussions in New York City, where tech received new attention in the hopes of moving past the 2008 financial crisis. The analysis turns to the micro level to capture data science’s inception in this mix of enthusiasm and uncertainty. It considers public gatherings at which data nerds presented new projects and ideas that seemed familiar until speakers focused on obscure data-analytic details. Then they slipped in stories of their broader experiences, struggles with clients, colleagues, scientists, and their own anxieties, without any obvious link to technical work. Counterintuitively, these qualitative observations about their otherwise formal work and its surroundings were crucial for defining a professional role and identity. The book shows how the interplay of personal reflection, technical rigor, and public scrutiny gave the digital era, for better or for worse, a human face.