This paper proposes a way to improve the performance of existing algorithms for text classification in domains with strong language semantics. We propose a domain adaptation layer learns weights to combine a generic and a domain specific (DS) word embedding into a domain adapted (DA) embedding. The DA word embeddings are then used as inputs to a generic encoder + classifier framework to perform a downstream task such as classification. This adaptation layer is particularly suited to datasets that are modest in size, and which are, therefore, not ideal candidates for (re)training a deep neural network architecture. Results on binary and multi-class classification tasks using popular encoder architectures, including current state-of-the-art methods (with and without the shallow adaptation layer) show the effectiveness of the proposed approach.
翻译:本文建议了一种方法来改进现有文字分类算法在语言语义强域的功能。 我们建议了一种域适应层学习权重,将一个通用和特定域(DS)的词嵌入一个经调整的域(DA)嵌入。然后,DA的字嵌入作为输入输入输入输入一个通用编码器+分类框架,以完成分类等下游任务。这个适应层特别适合规模不大的数据集,因此,它们不是(再)训练深神经网络结构的理想候选数据集。 使用流行的编码器结构,包括目前最先进的方法(有和没有浅的适应层)的二进制和多级分类任务的结果显示了拟议方法的有效性。