分布式最佳框架,将自然梯度与无偏差序列训练的赫斯人结合 (A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training)

This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner. It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods. A solution to a numerical issue in CG allows effective parameter updates to be generated with far fewer CG iterations than usually used (e.g. 5-8 instead of 200). This work also presents a novel preconditioning approach to improve the progress made by individual CG iterations for models with shared parameters. Although applicable to other training losses and model structures, NGHF is investigated in this paper for lattice-based discriminative sequence training for hybrid hidden Markov model acoustic models using a standard recurrent neural network, long short-term memory, and time delay neural network models for output probability calculation. Automatic speech recognition experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types. These experiments show that NGHF achieves larger word error rate reductions than standard stochastic gradient descent or Adam, while requiring orders of magnitude fewer parameter updates.

翻译：本文为神经网络培训提供了一个新的自然坡度和无海珊(NGHF)优化框架,可以以分布方式有效运行。它依靠线性同化梯度算法,将自然梯度方法与海珊(HF)或其他二阶方法的本地弯曲信息相结合。对于CG中的数字问题,一个解决方案允许以比通常使用的更小得多的CG迭代(例如5-8而不是200)生成有效的参数更新。这项工作还提出了改进单个CG迭代为共享参数模型所取得进展的新的先决条件。虽然适用于其他培训损失和模型结构,但本文对NGHF进行了调查,目的是利用标准的经常性神经网络、长期短期记忆和延迟的神经网络模型,对混合隐含的Markov模型进行基于歧视顺序的培训,用于计算输出概率。在多种声学模型类型的多基因广播数据集上报告了自动语音识别实验。这些实验显示,在需要更高等级的LADGHF值、更低级级或更低级级标准降级时,NGHFFF值将达到比标准降级。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

专知会员服务

39+阅读 · 2020年11月3日

耶鲁大学《分布式系统理论》笔记，491页pdf