Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA.
翻译:研究人员制作了数千份载有宝贵技术知识的学术文件。 社区面临着阅读这些文件以识别、提取和综合信息这一艰巨任务。 自动化信息收集、文件级问题解答(QA)提供了一个灵活的框架,使人源问题能够适应多样性知识。 微调质量评估系统需要获得标签数据(背景、问答数),然而,文件质量评估的数据整理具有独特的挑战性,因为背景(即回答证据通过)需要从可能长、格式不完善的文件中检索。 现有的质量评估数据集通过提供在现实世界应用中不切实际的简短、明确界定的环境来回避这一挑战。 我们提出了一个三阶段的文件质量评估方法:(1) 从PDF中提取文本;(2) 从提取的证据检索到完善的环境;(3) 从背景中提取知识,以返回高质量的答案 -- -- 采掘、抽象或布林恩。 利用现有的质评估,我们的检测-检索综合(DRC)系统通过提供短期、明确界定的环境选择结果,同时将现有科学评估框架作为高水平的基准,以交付我们现有的科学基准。