文档收藏视觉问答 (Document Collection Visual Question Answering)

Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.

翻译：《文件理解》中目前的任务和方法旨在将文件作为单一要素处理,然而,文件通常按收集(历史记录、购买发票)来编排,为文件的解释提供有用的背景。为解决这一问题,我们引入了一个新的数据集和相关任务,即文件收集视觉问题解答(DocCVQA)为整个文件图像集提出问题,目的不仅是为特定问题提供答案,而且还要检索包含推断答案所需信息的整套文件。除了数据集之外,我们还提出了新的评估指标和基线,为新的数据集和任务提供进一步的洞察力。