This work shows that a diverse collection of linear optimization methods, when run on general data, fail to overfit, despite lacking any explicit constraints or regularization: with high probability, their trajectories stay near the curve of optimal constrained solutions over the population distribution. This analysis is powered by an elementary but flexible proof scheme which can handle many settings, summarized as follows. Firstly, the data can be general: unlike other implicit bias works, it need not satisfy large margin or other structural conditions, and moreover can arrive sequentially IID, sequentially following a Markov chain, as a batch, and lastly it can have heavy tails. Secondly, while the main analysis is for mirror descent, rates are also provided for the Temporal-Difference fixed-point method from reinforcement learning; all prior high probability analyses in these settings required bounded iterates, bounded updates, bounded noise, or some equivalent. Thirdly, the losses are general, and for instance the logistic and squared losses can be handled simultaneously, unlike other implicit bias works. In all of these settings, not only is low population error guaranteed with high probability, but moreover low sample complexity is guaranteed so long as there exists any low-complexity near-optimal solution, even if the global problem structure and in particular global optima have high complexity.
翻译:这项工作表明,尽管缺乏任何明确的限制或正规化,但不同系列的线性优化方法的收集在一般数据上没有过分完善,尽管缺乏任何明确的限制或正规化:其轨迹极有可能在人口分布方面处于最佳限制解决方案的曲线上。这一分析由一个基本但灵活的验证办法驱动,可以处理许多设置,概述如下。首先,数据可以是一般性的:与其他隐含的偏差办法不同,它不需要满足大幅度或其他结构条件,而且可以按顺序到达IMD,按顺序按马可夫链顺序分批到达,最后,它可以有沉重的尾巴。 其次,虽然主要分析针对镜像下降,但对于从强化学习中得出的时空-差异固定点方法,也提供了率;所有这些情况下以前的所有高概率分析都需要捆绑的迭代谢、受约束的更新、受约束的噪音或相当的。 第三,损失是一般性的,例如,可以同时处理后勤和平方损失,而与其他隐含的偏差工作不同。 在所有这些环境中,不仅保证低人口误差,而且即使抽样复杂性接近于高概率,而且全球的复杂度也具有如此之高的深度结构。