Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.
翻译:社区特别感兴趣的是低层次的深层神经网络培训,即带有分层因素的深层神经网络培训:在记忆消耗和培训时间方面,这种培训比未经考虑的培训具有效率; 先前的工作重点是培训前网络的低级别近似值和低级别空间培训,并增加其他目标,为选择的做法提供各种特别解释; 我们分析在实践中行之有效的技术,并通过对诸如GPT2等模型的广泛推算,我们提供了证据,证明该领域的共同信念是虚假的,在这一过程中暗示需要回答的令人振奋的研究机会。