Un-FAIR practices: different attitudes to data sharing

Photo: Maximilien Brice/CERN

By Laia Pujol and Jonathan Wareham

The sharing of scientific data has gained notoriety in the mainstream media in recent years and is seen as increasingly important, with government-funding organisations placing open data at the heart of scientific policy.

Yet, despite this positive promotion of scientific and economic data, the practices of data sharing and how it should be managed remain unclear and inconsistent, with many researchers keen to receive, but not give, data.

Laia Pujol, a researcher from Esade Business School, and Jonathan Wareham, Professor of Information Systems at Esade, have examined the issues of where and how data are being shared among scientists and discovered that the discipline of the research influences the attitudes towards data sharing and the practices that govern it.

The cost of failing to share research data is estimated to be up to €16bn a year to the European economy

"Researchers are generating unprecedented volumes of data," Pujol says. "Although some disciplines have a long tradition of working with big data, particularly the big science research infrastructures for physics and astronomy, and sharing it other disciplines have only recently begun to adopt the practice."

FAIR play

In order to share research data, it needs to be Findable, Accessible, Interoperable and Reusable (FAIR). And it's not just ease of use and good practice that’s taken into consideration when making data accessible; the cost of failing to share research data is estimated to be up to €16bn a year to the European economy.

"The urgency of sharing FAIR data is not only grounded in the ability to reproduce it,” explain the authors. "It’s a recognition of the novel technological and scientific innovations which result from data sharing."

While researchers acknowledge the benefits of sharing unpublished research data, few are willing to share their own

To share or not to share?

The benefits of sharing scientific data are many: an increase in transparency enabling peer reviews and verification of findings, the acceleration of scientific progress, improved quality of research and efficiency, and fraud prevention all led to gains in innovation across the board.

But the lack of consistency in data citing and clear rewards for sharing data contribute to researchers complaining of lack of credit for their work, misuse and misinterpretation leading to liability concerns, and an overall lack of incentives for sharing data.

"Evaluation of policy surrounding data sharing has been accompanied by requirements from funding agencies that scientific data be made publicly available, and investments in data repository infrastructures have followed," explains Pujol. "For example, in 2016, the European Commission launched the Open Science Cloud initiative, a federated data infrastructure with cloud-based services to offer the scientific community an open environment for storing, sharing, and re-using scientific data."

European Southern Observatory
In high-energy physics and astronomy, the absolute volumes of data collected and analysed are extremely large compared with other disciplines (Photo: Y. Beletsky/ESO)

"But considering policy and funding agencies’ efforts to foster data sharing and the apparent barriers to its wide adoption, we lack a representative overview of what data are being shared across by scientists, and insights into how researchers share their data."

The researchers examined data from two large-scale global online surveys collected in 2016 and 2018, in collaboration with a major academic publisher and the European Open Science Monitor. The study suggests that despite widespread support from policymakers and pressure from funding agencies, the number of academic researchers making their data available remains stable, with no growth shown over the past two years. And although researchers acknowledge the benefits of data sharing, their practices are still limited; one third of researchers say they have never shared data at all. While they acknowledge the benefits of sharing unpublished research data, few are willing to share their own.

Sector inconsistencies

The researchers also discovered that the sharing practices vary widely depending on the discipline of the researcher. Researchers in maths, computer science, physics, astronomy and life sciences all recognised the benefits of having access to others' data. However, while the majority of researchers within the physics and maths disciplines were willing to allow others access to their research, many of those within life sciences were keen to access, but not share, data.

The data sharing practices vary widely depending on the discipline of the researcher

Discriminatory access to data was also relevant in who was able to view data. Most data sharing is carried out between collaborators on the same projects, sharing data with selected partners on a case-by-case basis.

And while the majority of researchers take steps to manage their research data for potential future reuse, the application is not carried out consistently. The resulting discrepancies lead to a general perception of a lack of training among fellow researchers.

Understanding differences in data sharing

"To understand the differences in data sharing attitudes and practices across scientific disciplines we can speculate across two themes," say the authors. First the epistemic culture: "the nature of scientific work is likely to be highly determinative. For example, in high-energy physics and astronomy, the absolute volumes of data collected and analysed are extremely large compared with other disciplines.

Unsurprisingly, data sharing is common in these fields where large teams of scientists work collectively on large, capital-intensive research infrastructures with mature data repositories and tested curation policies. Second how researchers' incentives are protected by the data infrastructures used by scientists and their institutions.

Finally, "despite efforts to improve metadata and curation practices, the results suggest that we will continue to see informal, ad hoc or incomplete processes of communicating about data."

All written content is licensed under a Creative Commons Attribution 4.0 International license.