An open challenge in multimodal conversational AI requires augmenting large language models with information from textual and non-textual sources for multi-turn dialogue. To address this problem, this paper introduces Conversational Tables (cTBL), a three-step encoder-decoder approach to retrieve tabular information and generate dialogue responses grounded on the retrieved information. cTBL uses Transformer encoder embeddings for Dense Table Retrieval and obtains up to 5% relative improvement in Top-1 and Top-3 accuracy over sparse retrieval on the HyrbiDialogue dataset. Additionally, cTBL performs tabular knowledge retrieval using both encoder and decoder models, resulting in up to 46% relative improvement in ROUGE scores and better human evaluation for response generation on HyrbiDialogue.
翻译:在多模态对话人工智能中,一个公开的挑战是通过来自文本和非文本源的信息增强大语言模型,以实现多轮对话。为了解决这个问题,本文引入了 Conversational Tables (cTBL),这是一种三步骤的编码器-解码器方法,用于检索表格信息并生成基于检索到的信息的对话响应。cTBL 使用 Transformer 编码器嵌入进行 Dense Table Retrieval,在 HyrbiDialogue 数据集上相对稀疏检索获得了长达 5% 的最高 Top-1 和 Top-3 的准确度提升。此外,cTBL 采用编码器和解码器模型进行表格知识检索,在 HyrbiDialogue 上对于响应生成,ROUGE 得分相对提高了长达 46%,获得更好的人类评估结果。