通用数据稀疏化：使用充分统计量 (Generalized Data Thinning Using Sufficient Statistics) - 专知论文

会员服务 ·

0

相互独立的 · 随机变量 · Principle · 统计量 · INFORMS ·

2023 年 3 月 22 日

Generalized Data Thinning Using Sufficient Statistics

翻译：通用数据稀疏化：使用充分统计量

Ameer Dharamshi,Anna Neufeld,Keshav Motwani,Lucy L. Gao,Daniela Witten,Jacob Bien

Our goal is to develop a general strategy to decompose a random variable $X$ into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, $X$ can be "thinned" into independent random variables $X^{(1)}, \ldots, X^{(K)}$, such that $X = \sum_{k=1}^K X^{(k)}$. In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct $X$. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.

翻译：我们的目标是开发一种通用策略来将随机变量$X$分解为多个独立的随机变量，而不会牺牲任何关于未知参数的信息。最近的一篇论文表明，对于一些知名的自然指数族，$X$可以被“稀疏化”为独立的随机变量$X^{(1)},\ldots,X^{(K)}$，使得$X=\sum_{k=1}^K X^{(k)}$。在本文中，我们通过放宽这个求和要求，只要求一些已知独立随机变量的函数可以完全重构$X$，来推广他们的过程。这个过程的推广具有两个目的。首先，它大大扩展了可以进行稀疏化的分布族。其次，它将抽样分割和数据稀疏化统一起来，这两者在表面上似乎是非常不同的，但作为同一原则的应用。这个共享的原则是充分性。我们利用这个洞察力，为各种不同的家族执行广义稀疏化操作。

0

相关内容

相互独立的

相互独立的

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

【干货书】工程和科学中的概率和统计，

【干货书】工程和科学中的概率和统计，

专知会员服务

58+阅读 · 2022年12月24日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

山葡萄雄株性别CKX基因家族分析与VaCKX的性别转换功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

套代数框架下时变线性系统的同时稳定化

国家自然科学基金

0+阅读 · 2015年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

一类单位逼近卷积函数的边界渐近问题

国家自然科学基金

0+阅读 · 2013年12月31日

非线性系统可积性的若干机械化算法及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

高维随机覆盖问题及其在动力系统中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

图在曲面上嵌入的分类

国家自然科学基金

0+阅读 · 2011年12月31日

克里佛德代数结构框架下高维空间中若干问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Directional tests in Gaussian graphical models

Arxiv

0+阅读 · 2023年5月15日

Graphical Model Inference with Erosely Measured Data

Arxiv

0+阅读 · 2023年5月14日

Validated integration of semilinear parabolic PDEs

Arxiv

0+阅读 · 2023年5月14日

CHSEL: Producing Diverse Plausible Pose Estimates from Contact and Free Space Data

Arxiv

0+阅读 · 2023年5月14日

On the Partial Convexification for Low-Rank Spectral Optimization: Rank Bounds and Algorithms

Arxiv

0+阅读 · 2023年5月12日

Comparison of machine learning models applied on anonymized data with different techniques

Arxiv

0+阅读 · 2023年5月12日

Learning block structured graphs in Gaussian graphical models

Arxiv

0+阅读 · 2023年5月12日

Using Full-Text Content to Characterize and Identify Best Seller Books

Arxiv

0+阅读 · 2023年5月11日

Linear Programs with Polynomial Coefficients and Applications to 1D Cellular Automata

Arxiv

0+阅读 · 2023年5月10日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

【干货书】工程和科学中的概率和统计，

【干货书】工程和科学中的概率和统计，

专知会员服务

58+阅读 · 2022年12月24日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向可扩展深度神经网络的预测编码：理论与实践

如何快速获取数百万架无人机？

EMNLP 2025 | RTQA：递归思想求解复杂的时间知识图谱问答

组合式零样本学习综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Directional tests in Gaussian graphical models

Arxiv

0+阅读 · 2023年5月15日

Graphical Model Inference with Erosely Measured Data

Arxiv

0+阅读 · 2023年5月14日

Validated integration of semilinear parabolic PDEs

Arxiv

0+阅读 · 2023年5月14日

CHSEL: Producing Diverse Plausible Pose Estimates from Contact and Free Space Data

Arxiv

0+阅读 · 2023年5月14日

On the Partial Convexification for Low-Rank Spectral Optimization: Rank Bounds and Algorithms

Arxiv

0+阅读 · 2023年5月12日

Comparison of machine learning models applied on anonymized data with different techniques

Arxiv

0+阅读 · 2023年5月12日

Learning block structured graphs in Gaussian graphical models

Arxiv

0+阅读 · 2023年5月12日

Using Full-Text Content to Characterize and Identify Best Seller Books

Arxiv

0+阅读 · 2023年5月11日

Linear Programs with Polynomial Coefficients and Applications to 1D Cellular Automata

Arxiv

0+阅读 · 2023年5月10日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

相关基金

山葡萄雄株性别CKX基因家族分析与VaCKX的性别转换功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

套代数框架下时变线性系统的同时稳定化

国家自然科学基金

0+阅读 · 2015年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

一类单位逼近卷积函数的边界渐近问题

国家自然科学基金

0+阅读 · 2013年12月31日

非线性系统可积性的若干机械化算法及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

高维随机覆盖问题及其在动力系统中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

图在曲面上嵌入的分类

国家自然科学基金

0+阅读 · 2011年12月31日

克里佛德代数结构框架下高维空间中若干问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员