Is checkworthiness generalizable? Evaluating task and domain generalization of datasets for claim detection.

The spread of misinformation has reached a level at which neither research nor fact-checkers can monitor it only manually anymore. Accordingly, there has been much research on models and datasets for detecting checkworthy claims. However, the research in NLP is mostly detached from findings in communication science on misinformation and fact-checking. Checkworthiness is a notoriously vague concept whose meaning is contested among different stakeholders. Against the background of news value theory, i.e., the study of factors that make an event relevant for journalistic reporting, this is not surprising. It is argued that this vagueness leads to inconsistencies and poor generalization across different datasets and domains. For the experiments, models are trained on one dataset, tested on the remaining, and evaluated against the results on the original performance, against a random baseline, and against the scores when the models are not trained at all. The study finds that there is a drastic reduction in comparison with the performance on the original dataset. Moreover, often the models are outperformed by the random baseline and training on one dataset has no or even a negative impact on the performance on the other datasets. This paper proposes that future research should abandon this task design and instead take inspiration from research in communication science. In the style of news values, Claim Detection should focus on factors that are relevant for fact-checkers and misinformation.

Visit publication

Download Publication