Recent years have witnessed increasing interests in prompt-based learning in which models can be trained on only a few annotated instances, making them suitable in low-resource settings. When using prompt-based learning for text classification, the goal is to use a pre-trained language model (PLM) to predict a missing token in a pre-defined template given an input text, which can be mapped to a class label. However, PLMs built on the transformer architecture tend to generate similar output embeddings, making it difficult to discriminate between different class labels. The problem is further exacerbated when dealing with classification tasks involving many fine-grained class labels. In this work, we alleviate this information diffusion issue, i.e., different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer, by proposing a calibration method built on feature transformations through rotation and scaling to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings. Furthermore, we take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding by a coarse-to-fine metric learning strategy to enhance the distinguishability of the learned output embeddings. Extensive experiments on the three datasets under various settings demonstrate the effectiveness of our approach. Our code can be found at https://github.com/donttal/TARA.
翻译:近几年来,人们日益关注以快速为基础的学习,在这种学习中,模型只能在几个附加说明的事例中接受培训,使模型适合低资源环境。在使用基于快速的文本分类学习时,目标是使用预先培训的语言模型(PLM),在预定义的模板中预测一个缺失的标记,给一个输入文本提供预定义的模板,该模板可以绘制成一个类标签。然而,在变压器结构上建起的PLM往往产生类似的输出嵌入,使得不同类标签难以区分。在处理涉及许多细度类标签的分类任务时,问题进一步加剧。在这项工作中,我们缓解了这一信息传播问题,即在变压器中经过堆叠多层自我注意层之后,不同符号共享了相当一部分类似的信息,为此提议了一个校准方法,该方法建立在通过旋转和缩放将一个PLM编码嵌入到一个新的计量空间,以确保由此形成的嵌入方法的可辨别性。此外,我们利用双曲线嵌入式嵌入优势,在精细度/嵌入型系统化的系统化模型,以显示我们所学的系统化的模型化输出。