While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also show consistent gains over self supervised contrastively learned representations, especially in non-semantic tasks. Finally we show that these gains are not solely due to augmentation, but rather to a downstream optimized sequence representation. Code: https://github.com/hooman650/SupCL-Seq
翻译:虽然对比式学习被证明是计算机视觉的有效培训策略,但自然语言处理(NLP)只是最近才采用它作为自我监督的替代方法,来改进隐蔽语言建模(MLM),以改进序列显示。本文介绍SupCL-Seq,将监督式对比性学习从计算机视觉扩展到优化NLP的序列显示。通过改变标准变压器结构中每个表达式(anchor)的退出掩码概率,我们产生了更大的变换观点。随后,还利用监督式对比性损失来尽量扩大系统收集类似样本(例如锚和他们改变的观点)和将属于其他类的样本推开的能力。尽管SupCLSeqeq只是简单化,但SupCUE(GLUE)基准的许多序列分类任务与标准BERT基准相比却大有增益,包括CLA的6%绝对改进、MRPC的5.4%、RTE的4.7%和STSB的2.6 %。我们还显示在自我监督的对比性对比性对比性演示演示中取得了一致的进展,特别是在非manticleticle的任务中。最后,我们展示了GSqreqreab的排序。我们显示这些成绩不是最优化的排序。SUDRUDRADRADRUDRADRUDR。