Large language models (LLMs) have been shown to perform well in answering questions and in producing long-form texts, both in few-shot closed-book settings. While the former can be validated using well-known evaluation metrics, the latter is difficult to evaluate. We resolve the difficulties to evaluate long-form output by doing both tasks at once -- to do question answering that requires long-form answers. Such questions tend to be multifaceted, i.e., they may have ambiguities and/or require information from multiple sources. To this end, we define query refinement prompts that encourage LLMs to explicitly express the multifacetedness in questions and generate long-form answers covering multiple facets of the question. Our experiments on two long-form question answering datasets, ASQA and AQuAMuSe, show that using our prompts allows us to outperform fully finetuned models in the closed book setting, as well as achieve results comparable to retrieve-then-generate open-book models.
翻译:大型语言模型(LLMS)在回答问题和制作长式文本方面表现良好,这在少数的闭门作业环境中都是如此。前者可以使用众所周知的评价指标进行验证,而后者则难以评价。我们通过同时执行两项任务来解决评价长式输出的困难 -- -- 即进行需要长式回答的问答。这类问题往往是多方面的,即它们可能有模糊之处和/或需要来自多个来源的信息。为此,我们定义了查询精细提示,鼓励LLMS明确表达问题多面性并生成涵盖问题多个方面的长式答案。我们在两个长式回答数据集(ASQA和AQuAMUSe)上的实验表明,使用我们的提示使我们能够超越封闭书设置中完全调整的模型,并取得与检索当时的开放式书型模型相似的结果。