测试设置 Sising Via 随机矩阵理论 (Test Set Sizing Via Random Matrix Theory)

This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test size for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, and evaluating a new integral in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.

翻译：本文使用来自随机矩阵理论的技术, 找到理想的培训测试数据, 用于使用 m 数据点的简单线性回归, 每个独立的 n- 维维多变量 Gaussian 。它将“ 理想” 定义为满足完整性度量, 即实验模型错误是实际测量噪音, 从而公正地反映了模型的价值, 从而公正地反映了同一模型的价值。本文是任何模型的培训和测试大小第一个以真正最佳的方式解决的。培训集中的数据点数是仅取决于 m 和 n 的二次数多边理论1 的根; 多变量高斯、真正的模型参数的共变量矩阵, 以及真正的测量噪音退出计算。关键的数学困难是认识到这里的问题是在 Jacobi Ensemble的背景下讨论的, 这是描述已知随机矩阵模型的双元值的概率分布, 并且评估Selberg 和 Aomotomoto 风格中的新的组成部分。数学结果得到了彻底的计算证据的支持。此文档是向自动选择的机器学习的一步。

相关内容

矩阵论

关注 6

随着科学技术的迅速发展，古典的线性代数知识已不能满足现代科技的需要，矩阵的理论和方法业已成为现代科技领域必不可少的工具。诸如数值分析、优化理论、微分方程、概率统计、控制论、力学、电子学、网络等学科领域都与矩阵理论有着密切的联系，甚至在经济管理、金融、保险、社会科学等领域，矩阵理论和方法也有着十分重要的应用。当今电子计算机及计算技术的迅速发展为矩阵理论的应用开辟了更广阔的前景。因此，学习和掌握矩阵的基本理论和方法，对于工科研究生来说是必不可少的。全国的工科院校已普遍把“矩阵论”作为研究生的必修课。

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日