For the common good: The political economics of data in the life sciences
A six-year longitudinal study illustrates how public-private partnerships shape and share their data, balancing the commercial and scientific interests of their partners.
The collection and analysis of huge swathes of data has traditionally been seen as a competitive asset to be closely guarded and commoditized. But the datafication of every area of life has prompted public and private organizations to acknowledge that collaboration can create value that goes beyond profit and creates a greater good.
Often, the partners in the cloud-based communities that collect and govern mutually beneficial data — known as data commons — have competing objectives across multiple research and development domains. So how can the data and evidence they manage remain neutral and applicable to all?
Research from Esade’s Assistant Professor Laia Pujol Priego — who sadly passed away in 2023 — and Professor Jonathan Wareham, published in MIS Quarterly, addresses this issue. After being granted a rare level of insight into Open Targets (OT), a world-leading public-private partnership, the researchers were able to examine how data commons are generated and transformed by the differing motives of scientific and commercial entities.
Shared interests, opposing outcomes
Data commons are composed of diverse scientific and commercial organizations that generate, pool and link data across sectors and topics. But despite the shared interests in the data itself, the desired outcomes sought by community members can vary widely.
Scientific data are neither neutral nor agnostic
This takes on particular significance when viewed in the context of data resources that represent both private and common property. Known as ‘semicommons’, these data are collectively generated and shared with the objective of social or public good, while also maintaining a need for the private partners to commercialize the data.
To highlight the political nature of the journeys these data follow, Pujol and Wareham conducted a six-year longitudinal study of OT, which uses human genetics and genomics data for systematic drug target identification and prioritization.
OT consists of founding partner the European Bioinformatics Institute (EMBL-EBI), nonprofit research foundation Wellcome Sanger Institute, and private life science companies GSK, Biogen, Takeda, Celgene (later acquired by Bristol Meyers Squibb), Sanofi, Pfizer and Genentech.
The researchers’ detailed and comprehensive study of the journey in which data are constructed and awarded evidential value offers a vivid illustration of the differing mandates of each partner in this data commons.
The political path of data
To more closely analyze the data journey, Wareham and Pujol directly mapped a specific OT experiment, which they gave the synonym BI01.
BI01 tracks the interaction of gene function in triple-negative breast cancer (TNBC) and colon cancer (KRAS-mutant CAC). The researchers immersed themselves in the ways in which scientists from OT and all associated laboratories and communities worked, discussed and debated during the generation of the data, and analyzed the actions through which the data were engineered, aggregated, disaggregated and reconfigured.
Understanding the evolution and political dynamics of the data commoning will help to maximize the public value of these collaborations
As themes began to emerge, Wareham and Pujol noted how patterns of private data were layered or juxtaposed with commons data, and how continuous negotiations regarding configuration took place.
As the data journey progressed, Pujol and Wareham began to formally code their observations across the stages affecting data structure, specification and computational techniques. This detailed analysis revealed the political nature of BI01 data, charting its strategies, controversies and negotiations.
From origin to dissemination
After agreeing to focus on TNBC and KRAS-mutant CAC, OT pharma and scientific participants sought a consensus on the anchors (datasets embodying established and empirically validated genetic sequences) and data libraries (datasets encapsulating potential genetic associations). This origin process and the discussions around data management, participant negotiation and scientific understanding allowed participants to overcome the challenges of creating foundational data without revealing confidential research.
An experimental phase followed, in which BI01 was classified, grouped and restructured to prioritize specific interests. Each participant was given the opportunity to align data structure with their own therapeutic pipelines: pharmaceutical companies, with their significant economic resources, argued in favor of their own commercial interests, while the unique expertise of the scientific community provided essential balance in the decision-making process.
An algorithm was used to score gene combinations on a scale of 0-100, according to pre-agreed categories and criteria, with the highest-scoring gene combinations prioritized for progression. Finally, a dissemination process allowed participants to share the resulting filtered data (cleaned, categorized and mature enough to hold evidential value) with other OT partners. After a minimum period of 18 months, the data was then released into the public domain.
A dynamic journey
Within this journey, Wareham and Pujol identified three interwoven and complementary dynamics that were critical to determining what data held evidential value: patching, deconstructing and scaffolding.
Patching involved importing additional data sources to BI01 and aligning and aggregating the datasets. This allowed OT participants to supplement experimental data with private data sources to achieve their own objectives while retaining confidentiality. Subsequently, deconstructing the data on the BI01 journey required a detailed process of decomposing the resulting data to establish its meaning, superimposing new categories based on the individual criteria that had been established, then rebuilding it over time to generate meaning for prioritization.
Finally, scaffolding the data ensured a formal consistency that would facilitate its progression throughout the journey and, eventually, into the public domain. This phase affected the ways in which data could be analyzed and helped to maintain its integrity, particularly when shared with unknown audiences.
As data commoning and semicommons become more prevalent across industries, understanding the evolution and political dynamics of the process will help to maximize the public value of these collaborations.
The level of insight Wareham and Pujol gained through their detailed analysis confirms that scientific data are neither neutral nor agnostic. The data is shaped by its own journey and the participants who contribute to the cycle, each of whom has their own agenda.
Professor, Department of Operations, Innovation and Data Sciences at Esade
View profile- Compartir en Twitter
- Compartir en Linked in
- Compartir en Facebook
- Compartir en Whatsapp Compartir en Whatsapp
- Compartir en e-Mail
Do you want to receive the Do Better newsletter?
Subscribe to receive our featured content in your inbox.