高维量数据无分配独立测试 (A Distribution-Free Independence Test for High Dimension Data) - 专知论文

会员服务 ·

0

相互独立的 · Extensibility · 复合数据 · 特化 · 图模型 ·

2021 年 10 月 14 日

A Distribution-Free Independence Test for High Dimension Data

翻译：高维量数据无分配独立测试

Zhanrui Cai,Jing Lei,Kathryn Roeder

Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a single cell data set to test the independence between two types of single cell sequencing measurements, whose high dimensionality and sparsity make existing methods hard to apply.

翻译：独立测试在现代数据分析中具有根本重要性,在变量选择、图形模型和因果推断中具有广泛的应用。当数据是高维的,潜在依赖信号稀少时,独立测试就变得非常具有挑战性,没有分布或结构假设。在本文件中,我们提出独立测试的一般框架,先安装一个区分联合和产品分布的分类器,然后测试装配的分类器的意义。这个框架允许我们借用现代机器学习界所开发的最先进的分类算法的强度,使之适用于高维、复杂的数据。通过将样本分离和固定的调整结合起来,我们的测试统计有一个独立于基本数据分布的通用、固定的高斯无效分布。广泛的模拟显示了新提议的测试与现有方法相比的优势。我们进一步将新的测试应用于一个单一细胞数据集,测试两种类型的单细胞测序测量方法的独立性,其高度的维度和敏度使得现有方法难以应用。

0

相关内容

相互独立的

相互独立的

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

因果推断，Causal Inference：The Mixtape

因果推断，Causal Inference：The Mixtape

专知会员服务

109+阅读 · 2021年8月27日

应用机器学习书稿，361页pdf

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知会员服务

46+阅读 · 2020年9月28日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

学好机器学习，这里有你想要的一切

学好机器学习，这里有你想要的一切

算法与数据结构

5+阅读 · 2018年6月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

4+阅读 · 2017年12月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Estimating Divergences in High Dimensions

Arxiv

0+阅读 · 2021年12月8日

OODformer: Out-Of-Distribution Detection Transformer

Arxiv

0+阅读 · 2021年12月6日

Topology and Geometry of the Third-Party Domains Ecosystem

Arxiv

0+阅读 · 2021年12月6日

A high order explicit time finite element method for the acoustic wave equation with discontinuous coefficients

Arxiv

0+阅读 · 2021年12月6日

Quasi-universality of Reeb graph distances

Arxiv

0+阅读 · 2021年12月5日

Bayesian Optimal Two-sample Tests in High-dimension

Arxiv

0+阅读 · 2021年12月5日

Mixed Membership Distribution-Free model

Arxiv

0+阅读 · 2021年12月4日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Simplicial Closure and Higher-order Link Prediction

Arxiv

3+阅读 · 2018年2月20日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

因果推断，Causal Inference：The Mixtape

因果推断，Causal Inference：The Mixtape

专知会员服务

109+阅读 · 2021年8月27日

应用机器学习书稿，361页pdf

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知会员服务

46+阅读 · 2020年9月28日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

相关资讯

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

学好机器学习，这里有你想要的一切

学好机器学习，这里有你想要的一切

算法与数据结构

5+阅读 · 2018年6月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

4+阅读 · 2017年12月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Estimating Divergences in High Dimensions

Arxiv

0+阅读 · 2021年12月8日

OODformer: Out-Of-Distribution Detection Transformer

Arxiv

0+阅读 · 2021年12月6日

Topology and Geometry of the Third-Party Domains Ecosystem

Arxiv

0+阅读 · 2021年12月6日

A high order explicit time finite element method for the acoustic wave equation with discontinuous coefficients

Arxiv

0+阅读 · 2021年12月6日

Quasi-universality of Reeb graph distances

Arxiv

0+阅读 · 2021年12月5日

Bayesian Optimal Two-sample Tests in High-dimension

Arxiv

0+阅读 · 2021年12月5日

Mixed Membership Distribution-Free model

Arxiv

0+阅读 · 2021年12月4日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Simplicial Closure and Higher-order Link Prediction

Arxiv

3+阅读 · 2018年2月20日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员