We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are then fixed throughout the rest of the algorithm. For convenience, we refer to the fixed partitions as fixed mini-batches. Then for each computation iteration, the gradients are sequentially calculated on each fixed mini-batch. Because the size of fixed mini-batches is typically much smaller than the whole sample size, it can be easily computed. This leads to much reduced computation cost for each computational iteration. It makes FMGD computationally efficient and practically more feasible. To demonstrate the theoretical properties of FMGD, we start with a linear regression model with a constant learning rate. We study its numerical convergence and statistical efficiency properties. We find that sufficiently small learning rates are necessarily required for both numerical convergence and statistical efficiency. Nevertheless, an extremely small learning rate might lead to painfully slow numerical convergence. To solve the problem, a diminishing learning rate scheduling strategy can be used. This leads to the FMGD estimator with faster numerical convergence and better statistical efficiency. Finally, the FMGD algorithms with random shuffling and a general loss function are also studied.
翻译:我们研究了一种固定小批量梯度下降(FMGD)算法,用于解决具有海量数据集的优化问题。在 FMGD 中,整个样本被分成多个不重叠的分区。一旦形成分区,它们就会在其余算法中固定。为方便起见,我们将固定分区称为固定小批量。然后,对于每个计算迭代,梯度会在每个固定小批量上顺序计算。由于固定小批量的大小通常远小于整个样本大小,因此可以很容易地计算。这导致 FMGD 在计算上更高效,更实用。为了展示 FMGD 的理论特性,我们从具有恒定学习率的线性回归模型开始。我们研究它的数值收敛性和统计效率性质。我们发现,对于数值收敛和统计效率来说,必须使用足够小的学习率。尽管如此,过于小的学习率可能导致极其缓慢的数值收敛。为了解决这个问题,可以使用递减学习率调度策略。这会导致具有更快数值收敛和更好统计效率的 FMGD 估计量。最后,还研究了具有随机洗牌和一般损失函数的 FMGD 算法。