We introduce and tackle the problem of automatically generating short descriptions of Wikipedia articles (e.g., Belgium has a short description Country in Western Europe). We introduce Descartes, a model that can generate descriptions performing on par with human editors. Our human evaluation results indicate that Descartes is preferred over editor-written descriptions about 50% of time. Further manual analysis show that Descartes generates descriptions considered as "valid" for 91.3% of articles, this is the as same editor-written descriptions. Such performances are made possible by integrating other signals naturally existing in Wikipedia: (i) articles about the same entity in different languages, (ii) existing short descriptions in other languages, and (iii) structural information from Wikidata. Our work has direct practical applications in helping Wikipedia editors to provide short descriptions for the more than 9 million articles still missing one. Finally, our proposed architecture can easily be re-purposed to address other information gaps in Wikipedia.
翻译:我们引入并解决自动生成维基百科文章短描述的问题(例如,比利时有西欧国家的短描述)。我们引入了笛卡尔,这是一个可以产生与人类编辑同等的描述的模式。我们的人类评估结果表明,笛卡尔比编辑写描述更受青睐,大约50%的时间。进一步的人工分析显示,笛卡尔为91.3%的文章生成了被视为“有效”的描述,这是相同的编辑写描述。通过整合维基百科中自然存在的其他信号(一)关于同一实体的不同语言的文章,(二)关于同一实体的其他语言的现有短描述,(三)维基数据的结构信息,我们的工作在帮助维基百科编辑为仍然缺失的900多万篇文章提供短描述方面有着直接的实际应用。最后,我们提议的架构可以很容易地被重新使用,以解决维基百科中的其他信息差距。