Rank-变换子抽样:多重数据分割和可交换的p价值的推论 (Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values) - 专知论文

会员服务 ·

0

数据拆分 · 可交换的 · 统计量 · 推断 · 样本 ·

2023 年 1 月 6 日

Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values

翻译：Rank-变换子抽样:多重数据分割和可交换的p价值的推论

F. Richard Guo,Rajen D. Shah

from arxiv, 56 pages

Many testing problems are readily amenable to randomised tests such as those employing data splitting, which divide the data into disjoint parts for separate purposes. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We introduce rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. For the latter, our method improves coverage in finite samples and for the testing problems, our method is able to derandomise and improve power. Moreover, in contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level.

翻译：许多测试问题很容易被随机抽查,例如使用数据分割的方法,将数据分成不相连的部分,用于不同的用途。然而,尽管在原则上是有用的,随机抽查有明显的缺点。首先,对同一数据集进行的两项分析可能会导致不同的结果。第二,测试通常会因为没有充分利用整个样本而失去力量。作为这些缺点的一种补救办法,我们研究如何将测试统计数据或通过随机数据分割等多种随机实现的结果产生的p值结合起来。我们采用按等级转换的子抽样作为在轻度假设下对综合统计或p价值进行大量抽样推断的一般方法。我们用我们的方法处理一系列问题,包括测试高度数据的单式数据,测试偏重度回归模型的最佳性,测试按顺序随机的试验没有直接效果,并校准通过随机数据分割等多种随机实现的双机学习信任度。对于后者,我们的方法改进了定点抽样和测试问题的范围,我们的方法能够解调和改进权力。此外,我们的方法能够对一系列问题进行大量抽样推断,包括测试高度数据的单式模型进行测试,测试,测试是否适合偏重重重的回归模型。此外,我们采用高估定的组合方法。

0

相关内容

数据拆分

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

范德瓦尔斯异质结中界面电荷转移的超快光学研究

国家自然科学基金

0+阅读 · 2015年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

非线性Schordinger方程及其相关问题的变分方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

多参数传热反问题的RBF-MLPG方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

Osher通量在非结构网格有限体积法中的特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于约会规划和信息势的传感网低能耗移动数据收集问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hippo信号传导通路在肝移植后肝癌复发转移中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

多能级原子系综中双通道自旋极化干涉效应的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

CH3-Si(111)基半导体外延结构的电沉积制备及光电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors

Arxiv

0+阅读 · 2023年3月6日

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

Arxiv

0+阅读 · 2023年3月6日

On Regression in Extreme Regions

Arxiv

0+阅读 · 2023年3月6日

The (Surprising) Rate Optimality of Greedy Procedures for Large-Scale Ranking and Selection

Arxiv

0+阅读 · 2023年3月6日

Why multiple hypothesis test corrections provide poor control of false positives in the real world

Arxiv

0+阅读 · 2023年3月3日

Verifying the Union of Manifolds Hypothesis for Image Data

Arxiv

0+阅读 · 2023年3月2日

A Flexible Bayesian Clustering of Dynamic Subpopulations in Neural Spiking Activity

Arxiv

0+阅读 · 2023年3月2日

Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion

Arxiv

0+阅读 · 2023年3月2日

Can we avoid Double Descent in Deep Neural Networks?

Arxiv

0+阅读 · 2023年3月2日

Subset-Based Instance Optimality in Private Estimation

Arxiv

0+阅读 · 2023年3月1日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors

Arxiv

0+阅读 · 2023年3月6日

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

Arxiv

0+阅读 · 2023年3月6日

On Regression in Extreme Regions

Arxiv

0+阅读 · 2023年3月6日

The (Surprising) Rate Optimality of Greedy Procedures for Large-Scale Ranking and Selection

Arxiv

0+阅读 · 2023年3月6日

Why multiple hypothesis test corrections provide poor control of false positives in the real world

Arxiv

0+阅读 · 2023年3月3日

Verifying the Union of Manifolds Hypothesis for Image Data

Arxiv

0+阅读 · 2023年3月2日

A Flexible Bayesian Clustering of Dynamic Subpopulations in Neural Spiking Activity

Arxiv

0+阅读 · 2023年3月2日

Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion

Arxiv

0+阅读 · 2023年3月2日

Can we avoid Double Descent in Deep Neural Networks?

Arxiv

0+阅读 · 2023年3月2日

Subset-Based Instance Optimality in Private Estimation

Arxiv

0+阅读 · 2023年3月1日

相关基金

范德瓦尔斯异质结中界面电荷转移的超快光学研究

国家自然科学基金

0+阅读 · 2015年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

非线性Schordinger方程及其相关问题的变分方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

多参数传热反问题的RBF-MLPG方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

Osher通量在非结构网格有限体积法中的特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于约会规划和信息势的传感网低能耗移动数据收集问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hippo信号传导通路在肝移植后肝癌复发转移中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

多能级原子系综中双通道自旋极化干涉效应的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

CH3-Si(111)基半导体外延结构的电沉积制备及光电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员