We present an overview of the second edition of the CheckThat! Lab at CLEF 2019. The lab featured two tasks in two different languages: English and Arabic. Task 1 (English) challenged the participating systems to predict which claims in a political debate or speech should be prioritized for fact-checking. Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a check-worthy claim based on their usefulness for fact-checking that claim, (B) classify these same Web pages according to their degree of usefulness for fact-checking the target claim, (C) identify useful passages from these pages, and (D) use the useful pages to predict the claim's factuality. CheckThat! provided a full evaluation framework, consisting of data in English (derived from fact-checking sources) and Arabic (gathered and annotated from scratch) and evaluation based on mean average precision (MAP) and normalized discounted cumulative gain (nDCG) for ranking, and F1 for classification. A total of 47 teams registered to participate in this lab, and fourteen of them actually submitted runs (compared to nine last year). The evaluation results show that the most successful approaches to Task 1 used various neural networks and logistic regression. As for Task 2, learning-to-rank was used by the highest scoring runs for subtask A, while different classifiers were used in the other subtasks. We release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification.
翻译:任务1(英文)对参与系统提出了挑战,要求它们预测在政治辩论或演讲中哪些权利主张应当优先进行事实审查。任务2(阿拉伯文)要求(A)将一组可核对的网页排列为一组符合核对要求的网页,其依据是它们对核实索赔的有用性,(B)根据这些网页对核实目标索赔的实用性程度对这些网页进行分类,(C)查明这些页面的有用段落,(D)使用有用的网页来预测索赔要求的真实性。任务1(英文)对参与系统提出了挑战,以预测在政治辩论或演讲中哪些权利主张应当优先进行事实审查。任务2(阿拉伯文)要求(阿拉伯文)根据平均精确度(MAP)和正常的折扣累积收益(NDCG),以及F1(分类)对这些网页进行分类。共有47个小组登记参加这个实验室,而其中14个小组实际提交了社区评估,(与9个相比)用于预测索赔要求的事实质量。 核对系统提供了完整的评价框架,包括英文数据(来自事实核对来源)和阿拉伯文(加注)和注释(从头数)的数据评估,在任务1(我们使用的)下,在任务1级评估中使用了不同等级(我们使用的)下,在任务下,评估中,在任务1级评估中采用了最成功的评估是不同的评估,在任务1级(我们使用的分级(我们使用的),在任务)下,在任务级)中,在评估是不同的评估是用于进行。