Text classification plays an important role in many practical applications. In the real world, there are extremely small datasets. Most existing methods adopt pre-trained neural network models to handle this kind of dataset. However, these methods are either difficult to deploy on mobile devices because of their large output size or cannot fully extract the deep semantic information between phrases and clauses. This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small data set. Our framework mainly includes five layers: The encoder layer uses DISTILBERT to obtain context-sensitive dynamic word vectors that are difficult to represent in traditional feature engineering methods. Since the transformer part of this layer is distilled, our framework is compressed. Then, we use the next two layers to extract deep semantic information. The output of the encoder layer is sent to a bidirectional LSTM network, and the feature matrix is extracted hierarchically through the LSTM at the word and sentence level to obtain the fine-grained semantic representation. After that, the max-pooling layer converts the feature matrix into a lower-dimensional matrix, preserving only the obvious features. Finally, the feature matrix is taken as the input of a fully connected softmax layer, which contains a function that can convert the predicted linear vector into the output value as the probability of the text in each classification. Extensive experiments on two public benchmarks demonstrate the effectiveness of our proposed approach on an extremely small data set. It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score, and through the model size, training time, and convergence epoch, we can conclude that our method can be deployed faster and lighter on mobile devices.
翻译:文本分类在许多实际应用中起着重要作用 。 在现实世界中, 有极小的数据集 。 大多数现有方法都采用预先训练的神经网络模型来处理这类数据集。 但是, 这些方法要么由于输出大小大, 很难在移动设备上部署, 或者无法完全提取语句和条款之间的深语义信息 。 本文为短文本多级分类提出了一个基于多模式的深学习框架, 其数据集不平衡且极小 。 我们的框架主要包括五层 : 编码层使用 DISTILBER 来获取在传统特征工程方法中难以反映的对上下文敏感的动态词矢量。 由于该层的变异器部分被蒸馏, 我们的框架被压缩了。 然后, 我们用下两层来提取深语系信息。 编码层的输出被发送到双向的LSTM 网络, 特征矩阵通过文字和句层的层次分解, 以获得精细的语义表达式表达式表达式表达方式的精度。 之后,, 最高级的基层的变数层 将我们最细的直径的直径的直径化的直径, 转换的直径化的直径化的直径转换的直径化的直径直径转换的直径转换的直路路路路基,,,, 将每个基质化的直径直径转换成为整个的直径直路基质的直路基质的直路路路的直路路路路路的直径,,,,, 的直径流的直径向导的直径直径直径直路的直路路的直路的直路的直路的直路的直路的直路的直路路的直路的直路路的直路的直路路的直路的直路的直路的直路的直路的直路路路路路路路路路的直路基体,,,, 我们的直路的直路的直路的直径可转换演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演演