通过中位变量估算压缩表格数据 (Compressing Tabular Data via Latent Variable Estimation) - 专知论文

会员服务 ·

0

潜变量/隐变量 · 估计/估计量 · 潜在 · 相互独立的 · 优化器 ·

2023 年 2 月 20 日

Compressing Tabular Data via Latent Variable Estimation

翻译：通过中位变量估算压缩表格数据

Andrea Montanari,Eric Weiner

from arxiv, 45 pages; 6 pdf figures

Data used for analytics and machine learning often take the form of tables with categorical entries. We introduce a family of lossless compression algorithms for such data that proceed in four steps: $(i)$ Estimate latent variables associated to rows and columns; $(ii)$ Partition the table in blocks according to the row/column latents; $(iii)$ Apply a sequential (e.g. Lempel-Ziv) coder to each of the blocks; $(iv)$ Append a compressed encoding of the latents. We evaluate it on several benchmark datasets, and study optimal compression in a probabilistic model for that tabular data, whereby latent values are independent and table entries are conditionally independent given the latent values. We prove that the model has a well defined entropy rate and satisfies an asymptotic equipartition property. We also prove that classical compression schemes such as Lempel-Ziv and finite-state encoders do not achieve this rate. On the other hand, the latent estimation strategy outlined above achieves the optimal rate.

翻译：用于分析和机器学习的数据通常采取带有绝对条目的表格形式。我们采用一系列无损压缩算法,用于分四个步骤进行的数据:美元(一)美元(估计与行和列相关的潜在变量);美元(二)美元(一)按行/柱潜值将表格按区块分隔;美元(三)美元(三)对每个区块应用顺序(例如Lempel-Ziv)编码器;美元(四)美元(四)追加对潜值的压缩编码。我们评估了几个基准数据集,并研究了该表数据概率模型中的最佳压缩法,根据这一模型,潜值是独立的,表条目是有条件独立的;我们证明该模型有一个明确界定的英特比率,并满足了一种无符号装备属性的属性。我们还证明Lempel-Ziv和有限状态编码器等古典缩缩压缩方法没有达到这一速率。另一方面,上文概述的潜值估计战略达到了最佳速率。

0

相关内容

潜变量/隐变量

潜变量/隐变量

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

112+阅读 · 2020年11月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

基于能带补偿的纳米硅/晶硅异质结电池的界面修饰与调控

国家自然科学基金

0+阅读 · 2014年12月31日

梯度热障涂层内的传热微观机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Fe掺杂CuGaS2中间带薄膜材料的制备及光电特性

国家自然科学基金

0+阅读 · 2014年12月31日

大豆种子硬度关联基因GmPG1a的克隆与功能验证

国家自然科学基金

0+阅读 · 2013年12月31日

基于时空域模型分解策略的流程企业级协同优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Unsupervised Mixture Estimation via Approximate Maximum Likelihood based on the Cramér - von Mises distance

Arxiv

0+阅读 · 2023年4月10日

Linking a predictive model to causal effect estimation

Arxiv

0+阅读 · 2023年4月10日

Are Visual Recognition Models Robust to Image Compression?

Arxiv

0+阅读 · 2023年4月10日

Moment Estimation for Nonparametric Mixture Models Through Implicit Tensor Decomposition

Arxiv

0+阅读 · 2023年4月10日

Frameworks for Estimating Causal Effects in Observational Settings: Comparing Confounder Adjustment and Instrumental Variables

Arxiv

0+阅读 · 2023年4月8日

Statistical and computational rates in high rank tensor estimation

Arxiv

0+阅读 · 2023年4月8日

Block particle filters for state estimation of stochastic reaction-diffusion systems

Arxiv

0+阅读 · 2023年4月7日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

潜变量/隐变量

估计/估计量

相互独立的

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

112+阅读 · 2020年11月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Unsupervised Mixture Estimation via Approximate Maximum Likelihood based on the Cramér - von Mises distance

Arxiv

0+阅读 · 2023年4月10日

Linking a predictive model to causal effect estimation

Arxiv

0+阅读 · 2023年4月10日

Are Visual Recognition Models Robust to Image Compression?

Arxiv

0+阅读 · 2023年4月10日

Moment Estimation for Nonparametric Mixture Models Through Implicit Tensor Decomposition

Arxiv

0+阅读 · 2023年4月10日

Frameworks for Estimating Causal Effects in Observational Settings: Comparing Confounder Adjustment and Instrumental Variables

Arxiv

0+阅读 · 2023年4月8日

Statistical and computational rates in high rank tensor estimation

Arxiv

0+阅读 · 2023年4月8日

Block particle filters for state estimation of stochastic reaction-diffusion systems

Arxiv

0+阅读 · 2023年4月7日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

基于能带补偿的纳米硅/晶硅异质结电池的界面修饰与调控

国家自然科学基金

0+阅读 · 2014年12月31日

梯度热障涂层内的传热微观机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Fe掺杂CuGaS2中间带薄膜材料的制备及光电特性

国家自然科学基金

0+阅读 · 2014年12月31日

大豆种子硬度关联基因GmPG1a的克隆与功能验证

国家自然科学基金

0+阅读 · 2013年12月31日

基于时空域模型分解策略的流程企业级协同优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员