While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections. We benchmarkT5 models on GooAQ and observe that: (a) in line with recent work, LM's strong performance on GooAQ's short-answer questions heavily benefit from annotated data; however, (b) their quality in generating coherent and accurate responses for questions requiring long responses (such as 'how' and 'why' questions) is less reliant on observing annotated data and mainly supported by their pre-training. We release GooAQ to facilitate further research on improving QA with diverse response types.
翻译:虽然日常问题有各种各样的答案类型,但当前的问答(QA)文献却未能充分解答问题的多样性问题。 为此,我们提供GooAQ,这是一个大型的数据集,有各种各样的答案类型。该数据集包含500多万个问题和300万个从Google收集的答案。GooAQ问题是使用其自动完整的功能从Google搜索引擎中收集的半自动的。这导致具有实际兴趣的自然问题,尽管这种问题很短,使用简单的语言表达。GooAQ的答案来自GooGooGoo对我们收集的问题的答复,特别是搜索结果中的答案框。这产生了一个丰富的答案类型空间,其中既有文字答案(短和长),也有结构化的集合。我们在GooAQ上标定了T5模型,并观察到:(a)根据最近的工作,LM在GooAQ的简短回答问题上的出色表现极大地得益于附加说明的数据;(b) 它们在对需要长期答复的问题作出一致和准确的答复方面的质量(例如,对需要长期答复的答案进行更精确的回答,例如“GoA”和进一步改进前的问题的支持。