When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead. Classical information retrieval systems do not answer information needs directly, but instead provide references to (hopefully authoritative) answers. Successful question answering systems offer a limited corpus created on-demand by human experts, which is neither timely nor scalable. Pre-trained language models, by contrast, are capable of directly generating prose that may be responsive to an information need, but at present they are dilettantes rather than domain experts -- they do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over. This paper examines how ideas from classical information retrieval and pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of domain expert advice.
翻译:当遇到信息需要时,用户希望与域专家接触,但往往转向信息检索系统,例如搜索引擎。古典信息检索系统并不直接回答信息需要,而是提供(希望权威的)答案的参考。成功的问答系统提供了由人类专家根据需求创建的有限内容,既不及时也不易缩放。相比之下,预先培训的语言模型能够直接产生可能满足信息需要的线索,但目前他们不是域专家,而是流利专家,他们并不真正了解世界,容易产生幻觉,而且关键是,他们无法在经过培训的文集中引用辅助文件来说明其发言理由。本文审查了如何合成传统信息检索和预先培训的语言模型的想法,并发展成真正兑现域专家建议承诺的系统。