To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
翻译:迄今,还没有“ 信息理论” 框架用于推理普遍化错误, 从而在确定静脉二次曲线优化时, 确定梯度下降的最小速率。 在这项工作中, 我们考虑通过以下几个现有信息理论框架确定这种比率的前景: 输入- 输出相互信息界限、 有条件的相互信息界限和变体、 PAC- Bayes 界限以及其中最近的有条件变量。 我们证明这些界限中没有一个能够建立最小速率 。 我们随后考虑在研究梯度方法时采用的一种通用策略, 即最终的代谢因高斯噪音而腐蚀, 产生一种吵闹的“ 代谢” 算法。 我们证明, 迷你运率无法通过分析这种代孕国而确定 。 我们的结果表明, 使用信息- 理论技术分析梯度下降需要新的想法 。