In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers in supporting relevant decision making have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and automated document classification. Prior studies have only focused on processing text for classification, whereas technical documents often contain multimodal information. This paper presents a novel multimodal deep learning architecture, TechDoc, for technical document classification, which utilizes three types of information, including natural language texts and descriptive images within documents and the associations among the documents. The architecture synthesizes the convolutional neural network, recurrent neural network, and graph neural network through an integrated multimodal training process. We applied the architecture to a large multimodal technical document database and trained the model for classifying documents based on the hierarchical International Patent Classification system. Our results show that TechDoc presents a greater classification accuracy than the unimodal methods and other state-of-the-art methods.
翻译:在大型技术公司中,工程师和管理人员为支持有关决策而编写的技术文件的管理和组织要求近年来急剧增加,导致对更可缩放、准确和自动化文件分类的需求增加,先前的研究仅侧重于分类文本的处理,而技术文件往往包含多式联运信息,本文介绍了一种新型的多式联运深层次学习结构TechDoc,用于技术文件分类,它利用了三类信息,包括自然语言文本和文件内部描述图象,以及文件内部的联系。该结构通过一体化多式联运培训过程综合了神经神经网络、经常性神经网络和图形神经网络。我们将该结构应用于一个大型多式联运技术文件数据库,并培训了基于等级国际专利分类系统的文件分类模式。我们的结果显示,TechDoc比单式方法和其他最新方法更精确地分类。