Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples. However, at a given step of the training process, some data are more helpful than others to continue learning. Importance sampling for training deep neural networks has been widely studied to propose sampling schemes yielding better performance than the uniform sampling scheme. After recalling the theory of importance sampling for deep learning, this paper reviews the challenges inherent to this research area. In particular, we propose a metric allowing the assessment of the quality of a given sampling scheme; and we study the interplay between the sampling scheme and the optimizer used.
翻译:随机梯度下降方法是通过有限的样本均匀地采样训练集,以计算无偏的梯度估计。 然而,在训练过程的某个阶段,某些数据比其他数据更有益于继续学习。 深度神经网络的重要性抽样已被广泛研究,以提出采样方案来改善性能。 在回顾深度学习重要性抽样理论后,本文介绍了这一研究领域面临的挑战。 特别是我们提出了一种评估给定抽样方案优劣的指标,并研究了抽样方案和优化器之间的相互作用。