测试设置 Sising Via 随机矩阵理论 (Test Set Sizing Via Random Matrix Theory)

This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test size for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, and evaluating a new integral in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.

翻译：本文使用来自随机矩阵理论的技术, 找到理想的培训测试数据, 用于使用 m 数据点的简单线性回归, 每个独立的 n- 维维多变量 Gaussian 。它将“ 理想” 定义为满足完整性度量, 即实验模型错误是实际测量噪音, 从而公正地反映了模型的价值, 从而公正地反映了同一模型的价值。本文是任何模型的培训和测试大小第一个以真正最佳的方式解决的。培训集中的数据点数是仅取决于 m 和 n 的二次数多边理论1 的根; 多变量高斯、真正的模型参数的共变量矩阵, 以及真正的测量噪音退出计算。关键的数学困难是认识到这里的问题是在 Jacobi Ensemble的背景下讨论的, 这是描述已知随机矩阵模型的双元值的概率分布, 并且评估Selberg 和 Aomotomoto 风格中的新的组成部分。数学结果得到了彻底的计算证据的支持。此文档是向自动选择的机器学习的一步。

相关内容

矩阵论

关注 6

随着科学技术的迅速发展，古典的线性代数知识已不能满足现代科技的需要，矩阵的理论和方法业已成为现代科技领域必不可少的工具。诸如数值分析、优化理论、微分方程、概率统计、控制论、力学、电子学、网络等学科领域都与矩阵理论有着密切的联系，甚至在经济管理、金融、保险、社会科学等领域，矩阵理论和方法也有着十分重要的应用。当今电子计算机及计算技术的迅速发展为矩阵理论的应用开辟了更广阔的前景。因此，学习和掌握矩阵的基本理论和方法，对于工科研究生来说是必不可少的。全国的工科院校已普遍把“矩阵论”作为研究生的必修课。

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日