Contrastive pretraining techniques for text classification has been largely studied in an unsupervised setting. However, oftentimes labeled data from related tasks which share label semantics with current task is available. We hypothesize that using this labeled data effectively can lead to better generalization on current task. In this paper, we propose a novel way to effectively utilize labeled data from related tasks with a graph based supervised contrastive learning approach. We formulate a token-graph by extrapolating the supervised information from examples to tokens. Our formulation results in an embedding space where tokens with high/low probability of belonging to same class are near/further-away from one another. We also develop detailed theoretical insights which serve as a motivation for our method. In our experiments with $13$ datasets, we show our method outperforms pretraining schemes by $2.5\%$ and also example-level contrastive learning based formulation by $1.8\%$ on average. In addition, we show cross-domain effectiveness of our method in a zero-shot setting by $3.91\%$ on average. Lastly, we also demonstrate our method can be used as a noisy teacher in a knowledge distillation setting to significantly improve performance of transformer based models in low labeled data regime by $4.57\%$ on average.
翻译:用于文本分类的对比性培训前技术在未经监督的环境下得到了广泛的研究。 但是, 通常可以提供来自相关任务的标签标签数据, 与当前任务共享标签语义。 我们假设, 使用标签数据能够有效导致更好地概括当前任务。 在本文中, 我们提出一种新的方法, 以基于图表的对比对比性学习方法, 有效利用相关任务标签数据, 以图表为基础, 监督对比性学习方法 。 我们用从示例到符号的外推法, 绘制一个标语图。 我们的配方结果是一个嵌入空间, 嵌入空间中属于同一类的标志高/ 低概率接近/ 远离另一个。 我们还开发了详细的理论洞察力, 作为我们方法的动力。 在使用 $ 13 美元的数据集的实验中, 我们展示了我们的方法比预培训计划优于2.5 美元, 并以平均 $ 的配方为1.8 。 此外, 我们用平均3. 91 $ 来显示我们方法的交叉效果。 最后, 我们还展示了我们的方法, 可以用方法作为低价的模型, 来大幅改进 用于 以 以 水平 水平 数据 升级 的 的教师 。