We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are then fixed throughout the rest of the algorithm. For convenience, we refer to the fixed partitions as fixed mini-batches. Then for each computation iteration, the gradients are sequentially calculated on each fixed mini-batch. Because the size of fixed mini-batches is typically much smaller than the whole sample size, it can be easily computed. This leads to much reduced computation cost for each computational iteration. It makes FMGD computationally efficient and practically more feasible. To demonstrate the theoretical properties of FMGD, we start with a linear regression model with a constant learning rate. We study its numerical convergence and statistical efficiency properties. We find that sufficiently small learning rates are necessarily required for both numerical convergence and statistical efficiency. Nevertheless, an extremely small learning rate might lead to painfully slow numerical convergence. To solve the problem, a diminishing learning rate scheduling strategy can be used. This leads to the FMGD estimator with faster numerical convergence and better statistical efficiency. Finally, the FMGD algorithms with random shuffling and a general loss function are also studied.
翻译:我们研究了一种用于求解大规模数据集优化问题的固定小批量梯度下降(FMGD)算法。在FMGD中,整个样本被分成多个非重叠分区。一旦分区形成,它们就在算法的其余部分中固定。为方便起见,我们将固定分区称为固定小批量。然后,对于每个计算迭代,渐变会在每个固定小批量上依次被计算。由于固定小批量的大小通常比整个样本大小小得多,因此它可以很容易地计算。这导致FMGD在计算上更加高效,实际上更加可行。为了证明FMGD的理论性质,我们从一个带有固定学习率的线性回归模型开始。我们研究其数值收敛和统计效率属性,发现数值收敛和统计效率都必须具有足够小的学习率。然而,学习率过于小可能会导致极慢的数值收敛。为了解决这个问题,可以使用递减学习速率调度策略。这导致了FMGD估计量的更快数值收敛和更好的统计效率。最后,我们还研究了具有随机洗牌和一般损失函数的FMGD算法。