Across the financial domain, researchers answer complex questions by extensively "searching" for relevant information to generate long-form reports. This workshop paper discusses automating the construction of query-specific document and entity knowledge graphs (KGs) for complex research topics. We focus on the CODEC dataset, where domain experts (1) create challenging questions, (2) construct long natural language narratives, and (3) iteratively search and assess the relevance of documents and entities. For the construction of query-specific KGs, we show that state-of-the-art ranking systems have headroom for improvement, with specific failings due to a lack of context or explicit knowledge representation. We demonstrate that entity and document relevance are positively correlated, and that entity-based query feedback improves document ranking effectiveness. Furthermore, we construct query-specific KGs using retrieval and evaluate using CODEC's "ground-truth graphs", showing the precision and recall trade-offs. Lastly, we point to future work, including adaptive KG retrieval algorithms and GNN-based weighting methods, while highlighting key challenges such as high-quality data, information extraction recall, and the size and sparsity of complex topic graphs.
翻译:在整个金融领域,研究人员通过广泛“搜索”相关信息以生成长式报告,回答复杂问题。本讲习班文件讨论为复杂研究专题而自动构建查询专要文件和实体知识图(KGs)的问题。我们侧重于域专家(1) 产生具有挑战性的问题的CODEC数据集,(2) 建立长长的自然语言说明,(3) 迭接搜索和评估文件和实体的相关性。在建设查询专要KGs时,我们显示,最先进的排名系统有需要改进的会议室,由于缺乏上下文或明确的知识代表而出现具体的缺陷。我们证明,实体和文件的相关性是积极的,基于实体的查询反馈提高了文件的排序效力。此外,我们利用CODEC的“地面图”进行检索和评价,显示准确性和回顾取舍。最后,我们指出未来的工作,包括适应性KG检索算法和基于GNN的加权方法,同时强调诸如高质量数据、信息提取回顾、以及复杂图表的规模和广度等关键挑战。