Every day people ask short questions through smart devices or online forums to seek answers to all kinds of queries. With the increasing number of questions collected it becomes difficult to provide answers to each of them, which is one of the reasons behind the growing interest in automated question answering. Some questions are similar to existing ones that have already been answered, while others could be answered by an external knowledge source such as Wikipedia. An important question is what can be revealed by analysing a large set of questions. In 2017, "We the Curious" science centre in Bristol started a project to capture the curiosity of Bristolians: the project collected more than 10,000 questions on various topics. As no rules were given during collection, the questions are truly open-domain, and ranged across a variety of topics. One important aim for the science centre was to understand what concerns its visitors had beyond science, particularly on societal and cultural issues. We addressed this question by developing an Artificial Intelligence tool that can be used to perform various processing tasks: detection of equivalence between questions; detection of topic and type; and answering of the question. As we focused on the creation of a "generalist" tool, we trained it with labelled data from different datasets. We called the resulting model QBERT. This paper describes what information we extracted from the automated analysis of the WTC corpus of open-domain questions.
翻译:每天都有人通过智能设备或在线论坛提出短问,以寻找所有询问的答案。随着所收集的问题越来越多,很难对每个问题都给出答案,这也是人们对自动回答的兴趣日益浓厚的原因之一。有些问题与已经回答的现有问题相似,而另一些问题则可以由外部知识来源(如维基百科)回答。一个重要的问题是,通过分析大量问题可以揭示出什么。2017年,布里斯托尔的“我们好奇的”科学中心启动了一个项目,捕捉布里斯托利亚斯的好奇心:该项目收集了1万多个关于不同主题的问题。由于在收集过程中没有给出任何规则,因此问题都是真正开放的,而且跨越了各种主题。科学中心的一个重要目的是了解其访客所关心的超出科学范围的问题,特别是社会和文化问题。我们通过开发一个能够用于执行各种处理任务的人工智能信息工具来解决这个问题:发现问题之间的等同性;发现主题和类型;回答问题。当我们专注于创建“直观”的模型工具时,我们从各种主题中提出了问题。我们训练科学中心的一个重要目的是了解它从科学、尤其是社会和文化问题中获取的数据。我们从数据库中获取了什么。