Pre-trained masked language models successfully perform few-shot learning by formulating downstream tasks as text infilling. However, as a strong alternative in full-shot settings, discriminative pre-trained models like ELECTRA do not fit into the paradigm. In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks. ELECTRA is pre-trained to distinguish if a token is generated or original. We naturally extend that to prompt-based few-shot learning by training to score the originality of the target options without introducing new parameters. Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead. Analysis shows that ELECTRA learns distributions that align better with downstream tasks.
翻译:受过训练的隐蔽语言模型通过将下游任务作为文本填充方式,成功地完成了几张短片学习。 但是,作为全镜头环境中的一个强有力的替代方法,像ELECTRA这样的具有歧视性的预先训练模型并不适合范例。 在这项工作中,我们把基于快速的短片学习应用到ELECTRA, 并表明它在许多广泛的任务中优于隐蔽语言模型。 ELECTRA经过预先培训,可以辨别代号是生成还是原始。 我们自然地将这一方法扩大到通过培训快速的短片学习,以便在不引入新参数的情况下对目标选项的原始性进行评分。 我们的方法可以很容易地适应于涉及不增加计算间接费用的多点预测的任务。 分析表明,ELECTRA学会了与下游任务更加一致的分布。