Modern algorithms for binary classification rely on an intermediate regression problem for computational tractability. In this paper, we establish a geometric distinction between classification and regression that allows risk in these two settings to be more precisely related. In particular, we note that classification risk depends only on the direction of the regressor, and we take advantage of this scale invariance to improve existing guarantees for how classification risk is bounded by the risk in the intermediate regression problem. Building on these guarantees, our analysis makes it possible to compare algorithms more accurately against each other and suggests viewing classification as unique from regression rather than a byproduct of it. While regression aims to converge toward the conditional expectation function in location, we propose that classification should instead aim to recover its direction.
翻译:二进制分类的现代算法依赖于计算偏移的中间回归问题。 在本文中, 我们确定分类和回归之间的几何区分, 使得这两个环境的风险能够更精确地联系起来。 我们特别指出, 分类风险只取决于回归者的方向, 我们利用这一规模的偏差来改进现有保障, 保证分类风险如何受中间回归问题的风险的约束。 基于这些保证, 我们的分析使得能够更准确地对等算法, 并且建议将分类视为独特的回归, 而不是其副产品。 虽然回归的目的是要向有条件的预期功能集中到位置, 我们建议分类应该旨在恢复其方向。