关于在分类问题中数据结构和损失功能之间的相互作用 (On the interplay between data structure and loss function in classification problems)

One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well. Although the low-dimensional structure of typical datasets is key to this behavior, most theoretical studies of overparametrization focus on isotropic inputs. In this work, we instead consider an analytically tractable model of structured data, where the input covariance is built from independent blocks allowing us to tune the saliency of low-dimensional structures and their alignment with respect to the target function. Using methods from statistical physics, we derive a precise asymptotic expression for the train and test error achieved by random feature models trained to classify such data, which is valid for any convex loss function. We study in detail how the data structure affects the double descent curve, and show that in the over-parametrized regime, its impact is greater for logistic loss than for mean-squared loss: the easier the task, the wider the gap in performance at the advantage of the logistic loss. Our insights are confirmed by numerical experiments on MNIST and CIFAR10.

翻译：现代机器学习的一个中心难题是严重超分化模型能否全面推广。虽然典型数据集的低维结构是这一行为的关键,但对超均化的理论研究大多侧重于异地体输入。在这项工作中,我们考虑的是可分析的结构性数据模型,其输入共差来自独立块块,使我们能够调和低维结构的显著性及其与目标功能的一致。我们使用统计物理学的方法,为经过训练的随机特征模型对这些数据进行分类的火车和测试错误得出精确的消沉表态,而随机特征模型对此类数据进行分类是有效的。我们详细研究数据结构如何影响双向下降曲线。我们发现,在过度平衡的系统中,数据结构对后勤损失的影响大于对中位损失的影响:任务越容易,在后勤损失的优势下业绩差距越大。我们对MNIST和CIFAR10进行的数字实验证实了我们的洞察。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

专知会员服务

39+阅读 · 2020年11月3日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification