System error: Open research data and publication-driven research
The blog post summarizes results from the project Sharing Research Data in Academia. A detailed article can be found here.
Open Data: An integral part of Horizon 2020
With Horizon 2020, the European Commission recently launched its biggest ever research funding programme. 80 billion Euros are meant to to drive economic growth, create jobs and foster innovation. A central means to achieve these aims is the open access to research data. Between 2014 and 2015, topic areas participating in the Open Research Data Pilot will receive funding of around €3 billion. From the pilot, the commission expects to gain insights for future Commission policy and EU research funding programmes. It does not necessarily need the pilot study to identify the main barrier for open data in research: Publication-driven research.
Open research data: Ideal and reality
According to a study from 2011, researchers see the main obstacle for scientific progress in the missing access to research data. In the same study, almost half of the respondents say they would not share their data. The factual number might even be higher considering that 30 per cent did not answer the question. Only 6 per cent said they would make all their data available. It can be subsumed that the prominent support for data sharing is not a sufficient motivator to change the researchers’ behavior.
Research data and electric cars: An explanation for the low willingness to share
The handling of research data can be compared with the purchase of an electric car. Of course, electric cars are good for the environment and therefore for the society. Yet, next to the limited range and long charging times, electric cars are expensive. Which is perhaps why many rather want to see the electric car in the neighbor’s garage than in their own.
A similar situation occurs when handling with research data. There is common agreement that the access to research data is good for the scientific progress and therefore for society. Yet, what is best for the society is not the most obvious behavior for a researcher. Similar to the relation between car driver and electric cars, the common interest argument can only convince a fraction of researchers to share data. Only a few are altruistic.
Publish or perish: The system error
Would more people drive an electric car if it was cheaper than a petrol-engined car? Possibly. In the academic system however, monetary incentives play a minor role. The currency here is recognition. And recognition is something you receive for publishing articles, not for publishing data.
Publish or perish is the mantra of the scientific practice. To have a career in science, to be promoted and to earn respect one has to publish in highly ranked journals. For many researchers data is the commodity for the next publication. This is why they do not share. And this is also why tenured professors and researchers rather share. The situation is absurd but comprehensible.
It is absurd because in many disciplines, especially in the hard sciences, the actual value lies in data and code rather than its narrative processing. It is also absurd because due to the publication-driven research, one tends to think that data and code was not a creative product. Everyone who studied research designs, data documentation and curation knows that the whole procedure is not just a machine to spreadsheet relation.
Why it is understandable not to share
Not to share data is yet understandable because for most researchers sharing data is rather associated with more disadvantages than advantages. If you share data too early, another colleague can use them and publish. And of course it is annoying if you counted the turtle nesting population of the Galapagos tortoise for the last 5 years and someone else lands a Nature contribution with that data. Or even worse: If the own Nature contribution is falsified. Even when your share data after you published with it, there are still annoying side effects.
On the other hand, there are many good reasons to share research data. The own work can profit from feedback and cooperations can evolve. One can do a bit for the scientific progress; which should not be underestimated as a motivation. A study from 2007 even shows that sharing data can lead to a higher citation rate. And after all: Does research data not belong to everyone, at least in publicly-financed research?
Nevertheless does the whole package of good reasons weigh less than the negligible effects on recognition and career. Sharing data is good for the karma but not the career. That is also absurd.
They just want to be loved
Would more researchers share data if they got more for it? Possibly. The currency does not even have to change. What is missing in the academic system is the recognition for intermediaries, also for data. Those who publish well get cited. The H-index increases and thereby the chances for professional advancement. Good articles are good for the career. Good data however are still not as important than they should be.
Preparing data for re-use is an effort that needs to be worthwhile. This is why data has to be citable, has to be cited when used and this must have a positive impact on academic standing. The first important step is the indexation of research data with persistent identifier, similar to DOIs for journal publications. Thereby data becomes citable, reusable and easier findable. The next important step is an appropriate formal recognition for sharing data. And this step is less technical, but rather cultural.
What is good for everyone needs to be logical for the individual. This is what research data and electric cars have in common.
- Image: “Story of the guy who would not share”, Hammerin Man (flickr).
This post is part of a weekly series of articles by doctoral candidates of the Alexander von Humboldt Institute for Internet and Society. It does not necessarily represent the view of the Institute itself. For more information about the topics of these articles and associated research projects, please contact email@example.com.
This post represents the view of the author and does not necessarily represent the view of the institute itself. For more information about the topics of these articles and associated research projects, please contact firstname.lastname@example.org.
Sign up for HIIG's Monthly Digest
and receive our latest blog articles.
We approach the de-mystification of this claim by looking at concrete examples of how AI (re)produces inequalities and connect those to several aspects which help to illustrate socio-technical entanglements.
“System Risk Indication” (SyRI) deployed by the dutch government for automatically detecting social benefit fraud. The program was shut down due to a severe lack in transparency and unproportional collection...
AI won’t kill us in the form of a time-travelling humanoid robot with an Austrian accent. But: AI is used in various military applications – supporting new concepts of command…