Misconceptions about academic data sharing
Written by Benedikt Fecher & Gert Wagner.
In a recent editorial in the New England Journal of Medicine, the authors Longo and Drazen critically assessed the concept of data sharing in medicine. Their main concern is that a “new class of research person will emerge” that uses data for their own original research questions. The authors, although indirectly, later refer to this class of researcher as “research parasites“. The label “research parasites” does certainly not reflect the zeitgeist of an increasingly collaborative research and initiatives towards openness and transparency. However, it reflects common misconceptions about academic data sharing.
Longo and Drazen make the (valid) point that data might be misinterpreted. On the other hand misinterpretation might be a matter of insufficient data documentation by primary researchers. Moreover, potential misinterpretation cannot be an argument for not sharing research data.
Longo and Drazen miss the very point of scientific research when they write, that the researchers may “even use the data to try to disprove what the original investigators had posited“. It is at the core of the scientific paradigm that researchers take nothing as final truth. This is what Popper proposed in his critical rationalism and Merton in his conceptualization of skepticism.
Longo’s and Drazan’s requirement to “start with a novel idea, one that is not an obvious extension of the reported work” is simply misleading. Especially medical research (which is the subject of Longo’s and Drazan’s) can immensely profit from old ideas through meta-analyses and replication studies that use original datasets.
However the authors touch upon a valid point: the issue of adequate credit for scientific data sharing. They indicate that the adequate form of recognition for data sharing is co-authorship. They suggest to work “symbiotically, rather than parasitically, with the investigators holding the data, moving the field forward in a way that neither group could have done on its own.”
While that is certainly true in particular cases, we argue that co-authorship as the solely instrument for giving credit will unnecessarily restrict the potential of data sharing and can even be to the detriment of the original researcher, for instance if the resulting publications lack quality. And in the case of replication studies, co-authorship makes no scientific sense.
The best instrument for giving “credit where credit is due” would be a much higher appraisal of data sharing by research communities via citations of data sets and the consideration of data “production” in career prospects, funding application and evaluations.
With this end in mind, this “new class of research person” is exactly the opposite of a “research parasite”. This person would be someone who is essential to the scientific enterprise in an increasingly data-intensive and collaborative environment. Longo and Drazen’s editorial however shows that there is still a long way to go before we reach Open Science.
This post represents the view of the author and does not necessarily represent the view of the institute itself. For more information about the topics of these articles and associated research projects, please contact firstname.lastname@example.org.
Sign up for HIIG's Monthly Digest
and receive our latest blog articles.
Whether civil society, politics or science – everyone seems to agree that the New Twenties will be characterised by digitalisation. But what about the tension of digital ethics? How do we create a digital transformation involving society as a whole, including people who either do not have the financial means or the necessary know-how to benefit from digitalisation? And what do these comprehensive changes in our actions mean for democracy? In this dossier we want to address these questions and offer food for thought on how we can use digitalisation for the common good.
Why is Artificial Intelligence so commonly depicted as a machine with a human brain? This article shows why one misleading metaphor became so prevalent.
Barriers in our physical environment are still widespread. While AI systems could eventually support detecting them, it first needs open training data. Here we provide a dataset for detecting steps...
How can we address the many inequalities in access to digital resources and lack of digital skills that were revealed by the COVID-19 pandemic?