Much of modern learning theory has been split between two regimes: the classical \emph{offline} setting, where data arrive independently, and the \emph{online} setting, where data arrive adversarially. While the former model is often both computationally and statistically tractable, the latter requires no distributional assumptions. In an attempt to achieve the best of both worlds, previous work proposed the smooth online setting where each sample is drawn from an adversarially chosen distribution, which is smooth, i.e., it has a bounded density with respect to a fixed dominating measure. We provide tight bounds on the minimax regret of learning a nonparametric function class, with nearly optimal dependence on both the horizon and smoothness parameters. Furthermore, we provide the first oracle-efficient, no-regret algorithms in this setting. In particular, we propose an oracle-efficient improper algorithm whose regret achieves optimal dependence on the horizon and a proper algorithm requiring only a single oracle call per round whose regret has the optimal horizon dependence in the classification setting and is sublinear in general. Both algorithms have exponentially worse dependence on the smoothness parameter of the adversary than the minimax rate. We then prove a lower bound on the oracle complexity of any proper learning algorithm, which matches the oracle-efficient upper bounds up to a polynomial factor, thus demonstrating the existence of a statistical-computational gap in smooth online learning. Finally, we apply our results to the contextual bandit setting to show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits in the case that contexts arrive in a smooth manner.
翻译:现代学习理论的许多内容在两个制度之间被分化了: 数据独立到达的古典 \ emph{ offline} 设置, 以及数据对抗到达的 \ emph{ online} 设置。 虽然前一种模式通常在计算上和统计上都是可移动的, 但后者不需要任何分布式的假设。 为了实现两个世界的最好结果, 先前的工作建议了平滑的在线设置, 每一个样本都是从对称选择的分布中提取的, 这是平滑的, 也就是说, 它在固定的主宰测量度上具有一个约束性的密度。 我们为学习一个非对等功能类的微缩缩缩微末级遗憾提供了紧凑的界限。 此外, 我们提供了第一个模式是计算效率的, 而不是统计上等值的计算法。 而在我们这个结构中, 最差的排序法是, 最差的缩略缩缩缩缩缩缩缩的缩略图是, 最终的缩略微缩缩略图是学习的缩缩缩缩缩缩的缩缩缩缩缩图。