Form 10-Q, the quarterly financial statement, is one of the most crucial filings for US public firms to disclose their financial and other relevant business operation information. Due to the gigantic number of 10-Q filings prevailing in the market for each quarter and diverse variations in the implementation of format given company-specific nature, it has long been a problem in the field to provide a generalized way to dissect and retrieve the itemized information. In this paper, we create a tool to itemize 10-Q filings using multi-stage processes, blending a rule-based algorithm with a CNN deep learning model. The implementation is an integrated pipeline which provides a solution to the item retrieval on a large scale. This would enable cross sectional and longitudinal textual analysis on massive number of companies.
 翻译:季度财务报表表格10-Q是美国公营公司披露其财务和其他相关业务业务信息的最重要文件之一,由于每个季度市场都有大量10-Q文件,而且由于具体公司性质不同,格式的实施也各不相同,因此长期以来在实地存在一个问题,即提供一种通用方式解剖和检索逐项信息。在本文件中,我们创建了一个工具,利用多阶段程序逐项列出10-Q文件,将基于规则的算法与CNN深层学习模式相结合。执行这一工具是一个综合管道,为大规模检索项目提供了解决办法。这将有利于对大量公司进行跨区和纵向文本分析。