以二维MDL为基础的直方图进行不受监督的分解 (Unsupervised Discretization by Two-dimensional MDL-based Histogram) - 专知论文

会员服务 ·

0

离散化 · 划分 · MoDELS · state-of-the-art · Principle ·

2022 年 12 月 9 日

Unsupervised Discretization by Two-dimensional MDL-based Histogram

翻译：以二维MDL为基础的直方图进行不受监督的分解

Lincen Yang,Mitra Baratchi,Matthijs van Leeuwen

from arxiv, Accepted version at Machine Learning

Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalized maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which Partitions each dimension ALternately and then Merges neighboring regions, all using the MDL principle. Experiments on synthetic data show that PALM 1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; 2) approximates well a wide range of partitions outside the model class; 3) converges, in contrast to the state-of-the-art multivariate discretization method IPD. Finally, we apply our algorithm to three spatial datasets, and we demonstrate that, compared to kernel density estimation (KDE), our algorithm not only reveals more detailed density changes, but also fits unseen data better, as measured by the log-likelihood.

翻译：不受监督的离散是许多知识发现任务中的关键一步。一维数据最先进的方法用最小描述长度( MDL) 原则推断出本地适应性直方图,但多维情况研究得少得多:目前的方法是一次考虑维度(如果不是独立的话),这导致基于适应大小的矩形单元格的离散。不幸的是, 这种方法无法充分描述不同维度和/ 或离散结果之间的依赖性, 由更多单元格( 或 bins) 构成的离异性。为了解决这个问题, 我们提议了一个直观模型类, 允许使用更灵活的二维数据分布。我们扩展了一维情况, 以基于标准化最大可能性( 一种精细化的 MDL 形式) 获得模型选择问题。由于我们模型等级的灵活性以巨大的搜索空间为代价, 我们引入了一种叫做 PALM 的模型, 将每个维度都按不同维度调整, 然后将邻近区域合并, 使用 MDL 的多维值对数据分布进行更灵活的分区。我们的模型将显示一个更精确的。

0

相关内容

离散化

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于Phase-type分布的多状态系统可靠性模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

神经元凋亡时GSK-3/Egr-1上调PUMA的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

中国碎米蕨类（cheilanthoid ferns）的系统学研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于有限字符特性的物理层多播预编码技术

国家自然科学基金

0+阅读 · 2012年12月31日

N-TiO2基Zn-卟啉染料敏化太阳能电池光电转化效率研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于动力学分析的Internet网络拥塞控制研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多目标进化算法的内建自测试（BIST）优化设计技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

On the lower bound for the length of minimal codes

Arxiv

0+阅读 · 2023年2月10日

Time-varying correlation network analysis of non-stationary multivariate time series with complex trends

Arxiv

0+阅读 · 2023年2月10日

Hessian Based Smoothing Splines for Manifold Learning

Arxiv

0+阅读 · 2023年2月10日

High order geometric methods with splines: fast solution with explicit time-stepping for Maxwell equations

Arxiv

0+阅读 · 2023年2月9日

Assessing the validity of Bayesian inference using loss functions

Arxiv

0+阅读 · 2023年2月9日

Consistent Group selection using Global-local prior in High dimensional setup

Arxiv

0+阅读 · 2023年2月9日

Implicit and Efficient Point Cloud Completion for 3D Single Object Tracking

Arxiv

0+阅读 · 2023年2月9日

Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

Arxiv

0+阅读 · 2023年2月8日

Robust Digital Watermarking Method Based on Adaptive Feature Area Extraction and Local Histogram Shifting

Arxiv

0+阅读 · 2023年2月8日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【COLT 2025最新教程】语言生成

【阿姆斯特丹博士论文】表示学习中的信息理论

【ICML2025】通过在线世界模型规划的持续强化学习

ACL 2025 | 事件检索增强大语言模型生成

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

On the lower bound for the length of minimal codes

Arxiv

0+阅读 · 2023年2月10日

Time-varying correlation network analysis of non-stationary multivariate time series with complex trends

Arxiv

0+阅读 · 2023年2月10日

Hessian Based Smoothing Splines for Manifold Learning

Arxiv

0+阅读 · 2023年2月10日

High order geometric methods with splines: fast solution with explicit time-stepping for Maxwell equations

Arxiv

0+阅读 · 2023年2月9日

Assessing the validity of Bayesian inference using loss functions

Arxiv

0+阅读 · 2023年2月9日

Consistent Group selection using Global-local prior in High dimensional setup

Arxiv

0+阅读 · 2023年2月9日

Implicit and Efficient Point Cloud Completion for 3D Single Object Tracking

Arxiv

0+阅读 · 2023年2月9日

Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

Arxiv

0+阅读 · 2023年2月8日

Robust Digital Watermarking Method Based on Adaptive Feature Area Extraction and Local Histogram Shifting

Arxiv

0+阅读 · 2023年2月8日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于Phase-type分布的多状态系统可靠性模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

神经元凋亡时GSK-3/Egr-1上调PUMA的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

中国碎米蕨类（cheilanthoid ferns）的系统学研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于有限字符特性的物理层多播预编码技术

国家自然科学基金

0+阅读 · 2012年12月31日

N-TiO2基Zn-卟啉染料敏化太阳能电池光电转化效率研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于动力学分析的Internet网络拥塞控制研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多目标进化算法的内建自测试（BIST）优化设计技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员