The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue conditions). Although several techniques have been proposed to automatically detect certain types of information in documents in various application domains, they provide limited support to help regulators automatically identify the text chunks related to financial information types, due to the complexity of financial documents and the diversity of the sentences characterizing an information type. In this paper, we propose FITI, an artificial intelligence (AI)-based method for tracing content requirements in financial documents. Given a new financial document, FITI selects a set of candidate sentences for efficient information type identification. Then, FITI uses a combination of rule-based and data-centric approaches, by leveraging information retrieval (IR) and machine learning (ML) techniques that analyze the words, sentences, and contexts related to an information type, to rank candidate sentences. Finally, using a list of indicator phrases related to each information type, a heuristic-based selector, which considers both the sentence ranking and the domain-specific phrases, determines a list of sentences corresponding to each information type. We evaluated FITI by assessing its effectiveness in tracing financial content requirements in 100 financial documents. Experimental results show that FITI provides accurate identification with average precision and recall values of 0.824 and 0.646, respectively. Furthermore, FITI can detect about 80% of missing information types in financial documents.
翻译:财务文件的完整性(内容方面)是投资资金的基本要求。为了确保完整性,金融监管机构花费大量时间根据相关内容要求仔细检查每份财务文件,其中规定了金融文件中应包含的信息类型(例如,股票问题条件的说明)。虽然提出了若干技术,以自动检测各种应用领域的文件中的某些类型的信息,但由于财务文件的复杂性和资料类型特点的判决书的多样性,它们提供了有限的支持,帮助监管机构自动识别与财务信息类型有关的文本块。在本文件中,我们建议FITI是一种人工智能(AI)方法,用于追查财务文件中的内容要求。根据新的财务文件,FITI选择了一套候选句子,用于高效率的信息类型识别。之后,FITI采用基于规则的和以数据为中心的综合方法,利用信息检索(IR)和机器学习(ML)技术,用以分析词语、句子和与资料类型有关的情况。最后,我们建议FITI,以人工智能智能智能(AI)为基础,以人工智能智能(AI)为基础,以追踪财务文件的准确性要求列表为基础,在每一信息类别中分别评估信息类型和FIT的排序中,我们用直判的顺序判断其排序。