In this paper we propose a new Deep Learning (DL) approach for message classification. Our method is based on the state-of-the-art Natural Language Processing (NLP) building blocks, combined with a novel technique for infusing the meta-data input that is typically available in messages such as the sender information, timestamps, attached image, audio, affiliations, and more. As we demonstrate throughout the paper, going beyond the mere text by leveraging all available channels in the message, could yield an improved representation and higher classification accuracy. To achieve message representation, each type of input is processed in a dedicated block in the neural network architecture that is suitable for the data type. Such an implementation enables training all blocks together simultaneously, and forming cross channels features in the network. We show in the Experiments Section that in some cases, message's meta-data holds an additional information that cannot be extracted just from the text, and when using this information we achieve better performance. Furthermore, we demonstrate that our multi-modality block approach outperforms other approaches for injecting the meta data to the the text classifier.
翻译:在本文中,我们提出了一个新的深学习(DL)信息分类方法。我们的方法基于最先进的自然语言处理(NLP)模块,加上一种新颖的方法来注入通常在信息发送者信息、时间戳、附带图像、音频、关联等信息中可获得的元数据输入。正如我们在整个文件中表明的,通过利用信息中所有可用渠道的纯文本,可以产生更好的代表性和更高的分类准确性。为了实现信息表达方式,每种输入类型都在适合数据类型的神经网络结构中的一个专用区块中处理。这种实施可以同时培训所有区块,并形成网络中的跨频道特征。我们在实验部分中显示,在某些情况下,信息的元数据包含额外信息,不能从文本中提取,在使用这些信息时我们取得更好的性能。此外,我们证明我们的多模式区块方法比将元数据注入文本分类器的其他方法要好。