Sascha Friesike*1,2, Benedikt Fecher1,3, Marcel Hebing3, Stephanie Linek4
1Alexander von Humboldt Institute for Internet and Society (HIIG), Berlin, Germany
2University of Wuerzburg, Germany
3German Institute for Economic Research (DIW Berlin), Berlin,Germany
4German National Library of Economics (ZBW), Kiel, Germany
*Correspondence to: sascha.friesike|a|hiig.de
Despite strong support from funding agencies and policy makers academic data sharing sees hardly any adoption among researchers. We argue that academia is a reputation economy in which researchers are motivated by reputational gains. Current policies that try to foster academic data sharing fail, as they try to either motivate researchers to share for the common good or force researchers to publish their data. Instead, we argue, data sharing needs to pay in the form of reputation. Hence, in order to tap into the vast potential that is attributed to academic data sharing we need to forge new policies that follow the guiding principle reputation instead of obligation.
In 1996, leaders of the scientific community met in Bermuda and agreed on a set of rules and standards for the publication of human genome data. What became known as the Bermuda Principles can be considered a milestone for the decoding of our DNA. These principles have been widely acknowledged for their contribution towards an understanding of disease causation and the interplay between environmental factors and genetic predisposition1. The principles shaped the practice of an entire research field as it established a culture of data sharing. Ever since, the Bermuda Principles are used to showcase how the publication of data can enable scientific progress. Considering this vast potential, it comes as no surprise that open research data finds prominent support from policy makers, funding agencies, and researchers themselves2,3. However, recent studies show that it is hardly ever practised4,5. We argue that the academic system is a reputation economy in which researchers are best motivated to perform activities if those pay in the form of reputation. Therefore, the hesitant adoption of data sharing practices can mainly be explained by the absence of formal recognition. And we should change this.
Useful but Hardly Practiced.
The research landscape today is characterized by a collaboration imperative6. Research questions are getting increasingly complex, and a number of specialists need to be brought together to perform a note-worthy investigation. Only a few fields remain that still allow lone investigators to develop meaningful insights7. The most prominent form of collaboration is the co-authored publication. However, there is further potential for scientific collaboration in the form of more modular collaboration practices: academic data sharing. Here, researchers make their primary datasets available to others. This has three major benefits: first, it allows asking new research questions with existing datasets, second, it facilitates the replicability of research results, and third, it enables new research practices such as large scale meta analyses. Combined, open data in research contributes to the quantity, quality, and pace of scientific progress. Neelie Kroes, the European Commissioner for the Digital Agenda even went so far as to say, that open access to research data “will boost Europe’s innovation capacity and give citizens quicker access to the benefits of scientific discoveries”8.
Despite its advantages and prominent support data sharing sees only hesitant adoption among research professionals. In fall 2014, we conducted a survey questioning 1564 academic researchers. 83% agreed that making primary data available greatly benefits scientific progress. Yet, only 13% stated that they had published their own data in the past.
In a similar way most journals disregard the vast potential of published data. In an analysis of 141 journals from economics, Vlaeminck9 found that only 29 (20%) had a mandatory data sharing policy. Alsheikh-Ali et al.10, in an analysis of 500 research articles from the 50 journals with the highest impact factor, found that the underlying data was only available in 47 (9%) cases. In most journals publishing data is neither expected nor enforced in order to get published. This is particularly troublesome when inaccurate or wrong scientific findings are used to make political decisions — as happened in the Reinhart and Rogoff case, where false statistics justified the introduction of austerity policies11. In this regard, open access to research data is not only a driver for scientific progress but also crucial for reproducibility and therefore trust in scientific results. Its meagre adoption among research professionals points to the need for new policies to motivate more academic data sharing.
Academia Is a Reputation Economy
Making data available to others is of little benefit for a researcher. Academia can be described as a reputation economy in which the individual researcher’s career depends on recognition among his or her peers. The commonly accepted metrics for academic performance (the journal citation index, the Hirsch index, and even altmetrics) are all based on research article publications. Data sharing, by contrast, receives almost no recognition. As a result, researchers are geared solely towards article publications as they invest their time and resources into activities that can increase their reputation.
80% of the respondents in our survey state that the main barrier to making data available is the concern that other researchers could published with it. At the same time, (76%) agree that researchers should generally share their data publicly. Few researchers (12%) are concerned about being criticized or falsified. These numbers show that researchers have no negative attitudes towards making data available nor are they afraid about being proven wrong. They largely recognize the potential of open access to research data. However, that does not motivate them enough to invest their time and resources into sharing their own data. This and the lack of journals that foster data sharing has led to a culture in which only a minority group, consisting of Open Access enthusiasts, publishes primary data5. Today’s low sharing culture reflects our academic reputation economy, in which most of one’s community standing comes from article publications. We therefore believe that data sharing and reuse will only become a standard practice if it pays in form of recognition. Policies addressing data sharing need to understand academia as a reputation economy in order to work.
Why Current Policies Fail
Current policies concerning data sharing mainly fall into two camps: they either try to motivate data sharing intrinsically by invoking the common good or they force researchers to share with mandatory sharing policies. Motivating researchers to share data for the common good fails as it is not in line with the incentives of the reputation economy12. Most researchers choose to invest their resources into activities that better contribute to their reputation. Consequently, debates around data sharing often focus on mandatory data sharing policies. They are embraced by funding agencies, such as the NIH in the U.S. and the Horizon2020 program in the European Union, alongside journals like Nature or PLOS ONE. Without a doubt, mandatory data sharing policies increase the number of shared datasets. However, this does not happen because researchers are motivated to do so but because it is a necessary evil to get to something else: research grants or journal publications.
And this comes with a major drawback: if data sharing is mandatory, researchers only invest the minimum time necessary to share. This in turn leads to badly labeled variables, poor documentation, and datasets that are hard to find. An empirical assessment of 18 published research papers of microarray studies showed that only 2 of them could be perfectly reproduced. In some cases it took months to reproduce a single figure13.
Mandatory data sharing policies lead to a situation that makes the reuse of datasets difficult, the core reason why data sharing is advocated in the first place. To develop a culture of prolific data sharing and reuse, policy makers, funding agencies, and research organizations need to value the publication of data, it needs to pay in form of reputation.
What Appropriate Policies Could Look Like
We need a measure that indicates the importance of a dataset. Such a measure could be analogous to the citation count, which indicates the impact a research article had in the scientific community. A measure for sahred data should count publications that used a dataset (e.g., by tracking DOIs). Researchers could thus gain reputation by publishing data that gets used. And researchers could indicate their importance to a field by the number of research articles they made possible based on their published datasets.
Funding agencies should take this measure into account and privilege scientists or research groups that have a track record of distinguished datasets. By switching their policies from mandatory sharing to rewarding good datasets, funding agencies could motivate researchers not only to share but to share in a more reusable fashion.
Research communities could do more for the recognition of good datasets. Best paper awards are commonplace at conferences, in journals, and in research fields. They are welcome signs of good work that researchers use to indicate their value. Good datasets need to receive similar forms of recognition to justify the work necessary to make them publicly available in a reusable form.
And lastly, journals need to take the issue more seriously. Data journals like Nature’s Scientific Data are a good first step, but need to gain impact in order to motivate the mainstream researcher to publish with them. Established journals could instead add a data section and publish descriptions of noteworthy datasets together with their scope of application. In doing so, journals could perform the magic trick of transforming datasets into a currency researchers are used to.
Given the constant increase in complexity of many research fields, more collaboration is desperately needed. Data sharing is a form of collaboration that is worthy of our support. It is currently a desirable practice that is having a tough time gaining traction. It is like the electric car that everyone knows is good for the environment but nobody wants to buy. It is important in the current situation to set the course to promotes data sharing and rewards those who make their data easily re-usable. Only when we do this will we be able to reap the benefits that are attributed to academic data sharing.
1 J. C. Venter, The Sequence of the Human Genome. Science. 291, 1304–1351 (2001).
2 NIH, Principles and Guidelines for Reporting Preclinical Research. National Institutes of Health. Turning Discovery Into Health. (2015), (available at www.nih.gov/about/reporting-preclinical-research.htm).
3 NSF, Chapter VI – Other Post Award Requirements and Considerations. National Science Foundation. Where Discoveries Begin (2013), (available at www.nsf.gov/pubs/policydocs/pappguide/nsf13001/aag_6.jsp#VID4).
4 C. Tenopir et al., Data Sharing by Scientists: Practices and Perceptions. PLoS ONE. 6, e21101 (2011).
5 P. Andreoli-Versbach, F. Mueller-Langer, Open access to data: An ideal professed but not practised. Research Policy. 43, 1621–1633 (2014).
6 B. Bozeman, C. Boardman, Research collaboration and team science: a state-of-the-art review and agenda (2014), Springer.
7 S. Wuchty, B. F. Jones, B. Uzzi, The Increasing Dominance of Teams in Production of Knowledge. Science. 316, 1036–1039 (2007).
8 N. Kroes, Opening Science Through e‑Infrastructures (2012), (available at europa.eu/rapid/press-release_SPEECH-12-258_en.htm).
9 S. Vlaeminck, Data management in scholarly journals and possible roles for libraries–Some insights from EDaWaX. Liber Quarterly. 23, 48-79 (2013).
10 A. A. Alsheikh-Ali, W. Qureshi, M. H. Al-Mallah, J. P. A. Ioannidis, Public Availability of Published Research Data in High-Impact Journals. PLOS ONE. 6, e24357 (2011).
11 T. Herndon, M. Ash, R. Pollin, Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge Journal of Economics. 38, 257–279 (2014).
12 B. Nelson, Data sharing: Empty archives. Nature. 461, 160–163 (2009).
13 J. P. A. Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics. 41, 149–155 (2009).