Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on update step magnitudes, we introduce a novel training procedure that dynamically decides the dimension and the composition of the current update step. Our procedure, Dynamic Batch Adaptation (DBA) analyzes the gradients of every sample and selects the subset that best improves certain metrics such as gradient variance for each layer of the network. We present results showing DBA significantly improves the speed of model convergence. Additionally, we find that DBA produces an increased improvement over standard optimizers when used in data scarce conditions where, in addition to convergence speed, it also significantly improves model generalization, managing to train a network with a single fully connected hidden layer using only 1% of the MNIST dataset to reach 97.79% test accuracy. In an even more extreme scenario, it manages to reach 97.44% test accuracy using only 10 samples per class. These results represent a relative error rate reduction of 81.78% and 88.07% respectively, compared to the standard optimizers, Stochastic Gradient Descent (SGD) and Adam.
翻译:目前深入学习适应性优化方法通过改变每个参数使用的有效学习速度来调整参数更新的阶梯级。 受在更新的阶梯级中批量大小和学习速度之间已知反比关系的影响, 我们引入了一种新的培训程序, 动态批量适应(DBA) 动态决定了当前更新步骤的尺寸和构成。 我们的程序, 动态批量适应(DBA) 分析每个样本的梯度, 并选择了最佳改进某些指标的子集, 如网络各层的梯度差异 。 我们展示了显示 DBA 的结果, 显著提高了模型融合速度。 此外, 我们发现 DBA 在数据稀缺条件下使用标准优化器时, 除了聚合速度之外, 也显著改进了模型的通用性, 管理着以单一完全连接的隐藏层对网络的培训, 仅使用MNIST数据集的1% 测试精度达到97.79% 。 更极端的假设是, 仅使用每类10个样本, 就能达到97.44%的测试精度。 这些结果显示, 相对的误差率分别减少81.78%和88. 77%。