In this paper, we study trade-offs between efficiency, cost and accuracy when pre-training Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.
翻译:在本文中,我们研究了在培训前培训具有不同培训前目标的变换器编码员之前,效率、成本和准确性之间的权衡。为此目的,我们分析了共同目标的特点,并将这些目标结合起来,以创造新的有效的培训前方法。具体地说,我们设计了基于简单统计方法的轻质象征性发电机,它可以取代计算重的ELECTRA发电机,从而大幅度降低成本。我们的实验还表明:(一) 有了更有效的替代BERT MLM的替代方法,以及(二) 有可能在使用较轻的发电机时使用较轻的变换器,而不会显著下降。