改进维基百科与 AI 的可验证性 (Improving Wikipedia Verifiability with AI)

Fabio Petroni,Samuel Broscheit,Aleksandra Piktus,Patrick Lewis,Gautier Izacard,Lucas Hosseini,Jane Dwivedi-Yu,Maria Lomeli,Timo Schick,Pierre-Emmanuel Mazaré,Armand Joulin,Edouard Grave,Sebastian Riedel

Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this reason, finding relevant sources is a difficult task: many claims do not have any references that support them. Furthermore, even existing citations might not support a given claim or become obsolete once the original source is updated or deleted. Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. Here, we show that the process of improving references can be tackled with the help of artificial intelligence (AI). We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowd-sourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system's suggested alternatives compared to the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that Side's first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims according to Side. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia. More generally, we hope that our work can be used to assist fact checking efforts and increase the general trustworthiness of information online.

翻译：可核实性是维基百科的核心内容政策:有可能受到质疑的维基百科的主张需要通过引用来支持。有上百万种在线文章,每月发布数千条新文章。因此,寻找相关来源是一项困难的任务:许多主张没有支持它们的任何引用。此外,即使现有引用也可能不支持某一主张,或者一旦原始来源更新或删除后就过时。因此,维护和改进维基百科引用的质量是一项重大挑战,并且迫切需要有更好的工具来帮助人类进行这项工作。在这里,我们显示改进引用的过程可以通过人工智能(AI)解决。我们开发一个基于神经网络的系统,称为Side,以识别不大可能支持其主张的引用,然后从网络上推荐更好的引用。我们用现有的维基百科参考资料来培训这一模型,从而学习数千名维基百科编辑的贡献和综合智慧。我们通过众包,发现对于最有可能引用的10 %最有可能被标记为无法核实的信任,人类更喜欢我们系统建议的替代方法,而用最初引用的参考码(AI ) 。我们开发一个以神经网络为基础的网络系统,叫做Side(Side) 70 %的引用比我们现有的英基维基版本更能更能更可能比我们现在更能更能更有用。我们现在更能的系统更能去收集一个比我们用一个比我们现在使用的系统更能去一个比现在使用的系统更能用来的推比现在使用的系统更能用来的推一个比。