使用交叉U-统计量进行无维度推理 (Dimension-agnostic inference using cross U-statistics) - 专知论文

会员服务 ·

0

统计量 · 样本 · 退化 · 高维 · 渐近理论 ·

2023 年 3 月 24 日

Dimension-agnostic inference using cross U-statistics

翻译：使用交叉U-统计量进行无维度推理

Ilmun Kim,Aaditya Ramdas

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

翻译：经典渐近理论用于统计推理通常包括通过固定维度d，而让样本大小n趋近于无穷大来构建统计量。最近，人们大量研究了这些方法在高维环境中的行为，其中d和n共同趋近于无穷大。这通常会导致不同的推理过程，具体取决于维度的假设，从而使从业者陷入困境：在具有100个样本和20个维度的数据集中，他们应该假设n≫d还是d/n≈0.2。本文考虑无维度推理的目标；开发方法，使其有效性不依赖于对d与n的任何假设。我们引入了一种方法，该方法使用现有测试统计量的变分表示，以及样本分裂和自标准化，以产生精确的测试统计量，其高斯极限分布不受d与n的任何影响。生成的统计量可以视为仔细修改退化的U-统计量，删除对角块并保留非对角块。我们为一些经典问题提供了实例，包括单样本均值和协方差测试，并展示我们的测试具有适当的局部替代方案下的最小最优功率。在大多数情况下，我们的交叉U-统计量匹配相应的（退化的）U-统计量的高维功率，最多相差一个√2因子。

0

相关内容

统计量

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

【干货书】统计基础、推理与推断，361页pdf

【干货书】统计基础、推理与推断，361页pdf

专知会员服务

86+阅读 · 2022年1月25日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

复杂数据下含指标项半参数模型结构的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于似然函数的统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

一类广义相依序列的概率不等式及统计推断研究

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

一类连续型随机过程的非参数统计推断研究

国家自然科学基金

0+阅读 · 2013年12月31日

带跳扩散模型的非参数统计推断研究

国家自然科学基金

0+阅读 · 2013年12月31日

非参数变换模型的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

一类半参数时间序列模型的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

概率并发理论

国家自然科学基金

1+阅读 · 2011年12月31日

Mixed Laplace approximation for marginal posterior and Bayesian inference in error-in-operator model

Arxiv

0+阅读 · 2023年5月16日

High-dimensional Inference for Dynamic Treatment Effects

Arxiv

0+阅读 · 2023年5月16日

Refining Amortized Posterior Approximations using Gradient-Based Summary Statistics

Arxiv

0+阅读 · 2023年5月15日

Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

Arxiv

0+阅读 · 2023年5月14日

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Arxiv

0+阅读 · 2023年5月12日

Bayesian high-dimensional covariate selection in non-linear mixed-effects models using the SAEM algorithm

Arxiv

0+阅读 · 2023年5月12日

Assessing Omitted Variable Bias when the Controls are Endogenous

Arxiv

0+阅读 · 2023年5月11日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

【干货书】统计基础、推理与推断，361页pdf

【干货书】统计基础、推理与推断，361页pdf

专知会员服务

86+阅读 · 2022年1月25日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Mixed Laplace approximation for marginal posterior and Bayesian inference in error-in-operator model

Arxiv

0+阅读 · 2023年5月16日

High-dimensional Inference for Dynamic Treatment Effects

Arxiv

0+阅读 · 2023年5月16日

Refining Amortized Posterior Approximations using Gradient-Based Summary Statistics

Arxiv

0+阅读 · 2023年5月15日

Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

Arxiv

0+阅读 · 2023年5月14日

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Arxiv

0+阅读 · 2023年5月12日

Bayesian high-dimensional covariate selection in non-linear mixed-effects models using the SAEM algorithm

Arxiv

0+阅读 · 2023年5月12日

Assessing Omitted Variable Bias when the Controls are Endogenous

Arxiv

0+阅读 · 2023年5月11日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

相关基金

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

复杂数据下含指标项半参数模型结构的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于似然函数的统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

一类广义相依序列的概率不等式及统计推断研究

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

一类连续型随机过程的非参数统计推断研究

国家自然科学基金

0+阅读 · 2013年12月31日

带跳扩散模型的非参数统计推断研究

国家自然科学基金

0+阅读 · 2013年12月31日

非参数变换模型的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

一类半参数时间序列模型的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

概率并发理论

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员