Effectively scaling large Transformer models is a main driver of recent advances in natural language processing. Dynamic neural networks, as an emerging research direction, are capable of scaling up neural networks with sub-linear increases in computation and time by dynamically adjusting their computational path based on the input. Dynamic neural networks could be a promising solution to the growing parameter numbers of pretrained language models, allowing both model pretraining with trillions of parameters and faster inference on mobile devices. In this survey, we summarize progress of three types of dynamic neural networks in NLP: skimming, mixture of experts, and early exit. We also highlight current challenges in dynamic neural networks and directions for future research.
翻译:有效推广大型变异器模型是最近自然语言处理进展的主要驱动力。动态神经网络,作为一个新兴研究方向,能够通过根据输入动态调整计算路径,扩大计算和时间的亚线性增长,从而扩大神经网络。动态神经网络可以成为对经过预先培训的语言模型不断增加的参数数目的有希望的解决办法,允许模型预先培训数万亿参数和更快地推断移动设备。在本次调查中,我们总结了NLP的三种动态神经网络的进展:滑动、专家混合和早期退出。我们还强调了动态神经网络和今后研究方向当前的挑战。</s>