Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.
翻译:诸如Bert、GPT和Wav2Vec等经过预先培训的大型模型已经显示出巨大的学习代表潜力,这些模型可转用于广泛的下游任务。由于资源和时间有限,很难获得大量受监督的数据。鉴于这一点,在通过微调、线性测试或低资源环境下的迅速调试等方法为各种下游任务采用大型经培训的数据集方面进行了大量研究。正常化技术对于加速培训和改进深层神经网络的普及至关重要,并被成功地应用于广泛的各种应用中。提出了许多正常化技术,但低资源下游NLP和演讲任务实现正常化的成功是有限的。其中一个原因是无法通过调整正常化参数来掌握各种下游任务所需的大量经培训的数据集。我们建议KullbackLeeper(KL)实现正常化(KL-Norm),使正常化数据表现良好,有助于更普遍化,因为它减少了超常、超常的模型,并成功地用于区域分布模式,但下游NL的正常化和发言任务正常化,从而展示了高端不相干、不相干、不相干、高压的成绩评估。