Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of $67\%$ of the benefits of natural pre-training. Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME. Third, we find that $39\%$ of the benefits can be attained by using merely the parameter statistics of synthetic pre-training. We release the source code at https://github.com/felixzli/synthetic_pretraining.
翻译:培训前的表述对一系列广泛的下游任务有效,但目前还不清楚培训前培训的特性对有效收益有何必要。值得注意的是,最近的工作表明,即使合成任务的培训前培训也能在下游任务中取得重大收益。在这项工作中,我们进行了三次实验,迭接地简化了培训前培训,并表明简化工作仍保留了大部分收益。首先,在以往工作的基础上,我们对六种下游任务方面的三种现有合成培训前培训方法进行了系统评估。我们发现了最佳的合成培训前方法LIME(LIME),获得的自然培训前培训平均利益67美元。第二,我们感到惊讶的是,我们发现,就《原则和规则》功能界定的简单和通用的合成任务进行的培训前培训,可实现650,000美元的利益,几乎与LIME相匹配。第三,我们发现,仅使用合成培训前培训的参数统计即可实现39 美元的利益。我们在https://github.com/feliixzli/synthetic_pretrain.