Fassade of a skyscraper with offices. Some of them have blue and purple colored lights on, some are dark, representing the data institute

10 January 2024| doi: 10.5281/zenodo.13221900

How does the data institute become public-interest orientated?

A digital society needs not only suitable infrastructure but also responsible entities to coordinate it. The German government has recognized this need and established the first data institute. On November 20, 2023, Wikimedia Germany organized an interactive session at the Digital Summit in Jena. It addressed the question of how the data institute can work in the long term as well as in the public interest. Theresa Züger moderated the discussion and summarises the approaches and ideas here.

In the session, five discussion groups with different focus topics were formed, one of which I had the privilege of moderating. This post presents an excerpt of the discussed ideas, as described by Wikimedia, to provide guidance for the implementation of the data institute. My group focused on how the public interest can be sustainably secured, especially in the tension between public welfare and profit interests, and what mechanisms contribute to the continuous safeguarding of the public interest.

Various visions for the role of the data institute

The discussion initially articulated various ideas for the actual work and role of the data institute, which is relevant as its operational methods are still to be clarified in many respects. While some see the institute clearly as a data custodian managing and regulating access to data, others emphasized its role in fostering. Whether and how the data institute will fulfill both roles is unclear. The announcements state that it “coordinates the data ecosystem, networks across sector boundaries, and enables innovations. The data institute is intended to serve as a central point of contact that bundles holistic and interdisciplinary expertise, provide practical methodological competence and solutions. It is expected to build on the numerous existing initiatives in the data field, connect them, and initiate new cross-sector projects” (BMI).

With the idea of the institute as a data custodian, which manages data as a non-governmental but public institution (similar to the media institutions), the idea arose that commercial digital platforms – in the sense of big tech companies – could be obliged to make the data they collect available to the data institute in anonymised form at the request of their users. This way, the data would be accessible to other actors, countering data monopolies and enabling broader competition in data-based applications.

Where does the data come from?

This question also left room for discussion. The consensus in the group was that a lot of data based on municipal structures already exists. The data institute now has the task of creating structures to make this data publicly usable. The idea of some speakers was that the data should be as open as possible for various actors, allowing for competition and a variety of use cases based on public data. Different purposes of use, both profit-oriented and non-profit-oriented serving the public interest, should be conceivable and made possible under various licenses that could create different conditions for public welfare-oriented and profit-oriented projects. One participant emphasized that there are use cases where there is no economic business case from data usage since certain public welfare-oriented data projects may not generate money. In such cases, the data institute should actively step in as a supporter to enable such projects that benefit the public welfare. Furthermore, it was suggested that data collectors should be supported and subsidized in general, for instance by recognizing data collection, as well as open-source development, as a charitable purpose.

What role does the economy play for a public welfare-oriented data institute?

This question was more controversial. While some clearly see a mandate for fundamentally public welfare-oriented projects at the data institute, others believe that economic interests should play a role in every use case. However, data should not only flow into the economy through the data institute but also be “returned.” For example, a license model could specify that those who use data must, for instance, return 20 percent of the new data generated from their product based on public data to the institute. This idea was formulated more concretely as “public welfare return.” This could take the form of data, as outlined, but also of a financial contribution or other forms of participation, contributing to the long-term financing of the data institute.

In general, it was criticized that there is currently only an either-or situation for data projects: either they are non-profit and therefore not allowed to make profits OR they are profit-oriented and may therefore often fail to meet their public welfare goals as they are subject to investors and market logic. The plea named the need for solutions in between that could combine economic activity with a public welfare interest.

Who is allowed to participate?

Another steering mechanism, according to one suggestion, could be requirements for a diverse composition of actors for consortia on use cases. Framework conditions for use cases could, for example, define that a start-up or a medium-sized company, each with a larger company and a civil society actor, must cooperate. At the same time, the number of partners and interests always increases complexity. But this could succeed in involving actors who are otherwise underrepresented.

How to protect the data institute against abuse or harm?

One open question discussed how the quality of data and its maintenance by or for the data institute could be ensured and what requirements should be imposed on datasets provided through the institute by private or non-profit actors. For example, from the perspective of one participant, it would be too high a requirement to mitigate all biases from datasets. Instead, good documentation of the data would be necessary, transparently outlining their limitations, including possible biases, and their quality.

Visions and reality – one must stay tuned

However, these ideas are currently just visions. The data institute is currently in the process of selecting the first so-called use cases. The first two use cases are expected to focus on post-corona and energy efficiency. According to the participants of our discussion, public welfare orientation should be considered even in the selection of cases, and it should be reflected, for example, in the application process by those responsible for the cases.

And who decides?

The crucial question, however, will be through which process and by which players the respective use cases will be selected. Public welfare always requires deliberation, which is why the question of the process and the possibility of participation is decisive. This workshop at the Digital Summit was an encouraging example of successful participation. Now it remains to be seen how the ideas of the participants will be incorporated and what processes and goals the data institute will actually develop in the coming months.

This post represents the view of the author and does not necessarily represent the view of the institute itself. For more information about the topics of these articles and associated research projects, please contact info@hiig.de.