STaSy: 计分表数据综述 (STaSy: Score-based Tabular data Synthesis) - 专知论文

会员服务 ·

0

多样性 · MoDELS · Learning · 去躁分数匹配 · 分数匹配 ·

2022 年 10 月 8 日

STaSy: Score-based Tabular data Synthesis

翻译：STaSy: 计分表数据综述

Jayoung Kim,Chaejeong Lee,Noseong Park

from arxiv, 27 pages

Tabular data synthesis is a long-standing research topic in machine learning. Many different methods have been proposed over the past decades, ranging from statistical methods to deep generative methods. However, it has not always been successful due to the complicated nature of real-world tabular data. In this paper, we present a new model named Score-based Tabular data Synthesis (STaSy) and its training strategy based on the paradigm of score-based generative modeling. Despite the fact that score-based generative models have resolved many issues in generative models, there still exists room for improvement in tabular data synthesis. Our proposed training strategy includes a self-paced learning technique and a fine-tuning strategy, which further increases the sampling quality and diversity by stabilizing the denoising score matching training. Furthermore, we also conduct rigorous experimental studies in terms of the generative task trilemma: sampling quality, diversity, and time. In our experiments with 15 benchmark tabular datasets and 7 baselines, our method outperforms existing methods in terms of task-dependant evaluations and diversity.

翻译：图表数据合成是机器学习的长期研究课题,在过去几十年中提出了许多不同的方法,从统计方法到深层基因化方法,但是,由于真实世界的表层数据性质复杂,它并不总是成功。在本文中,我们提出了一个新的模型,名为基于分数的表层数据合成(STaSy)及其基于以分数为基础的基因化模型模式模式模式的培训战略。尽管基于分数的基因化模型解决了基因化模型中的许多问题,但在表格数据合成方面仍有改进的余地。我们拟议的培训战略包括一种自我节奏学习技术和微调战略,通过稳定分数匹配培训,进一步提高抽样质量和多样性。此外,我们还在基因化任务三角任务:抽样质量、多样性和时间方面进行了严格的实验研究。在我们以15个基准表数据集和7个基线进行的实验中,我们的方法在任务依赖性评估和多样性方面超越了现有的方法。

0

相关内容

多样性

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算模拟的方法研究嵌段共聚物在选择性溶剂中的胶束形成过程

国家自然科学基金

0+阅读 · 2015年12月31日

溶剂热法FeSe基超导材料制备和物性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于云计算的普适人体传感网关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

镉低积累甘蓝型油菜的筛选及表达谱分析

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

基于四氢噻唑-2-硫酮-4-羧酸的新型手性有机小分子和离子液体催化剂的合成及应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

稀土离子强化乙二醇类多元醇—乙二胺水体系捕集燃煤烟气中CO2的研究

国家自然科学基金

0+阅读 · 2011年12月31日

桉叶油的热力学基础数据及精馏提纯过程模拟

国家自然科学基金

0+阅读 · 2009年12月31日

Mather理论与Hamilton系统的不稳定性

国家自然科学基金

0+阅读 · 2008年12月31日

Normalizing Flow with Variational Latent Representation

Arxiv

0+阅读 · 2022年11月21日

CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation

Arxiv

0+阅读 · 2022年11月21日

Weighted Ensemble Self-Supervised Learning

Arxiv

0+阅读 · 2022年11月18日

Validation Diagnostics for SBI algorithms based on Normalizing Flows

Arxiv

0+阅读 · 2022年11月17日

Machine Learning for Software Engineering: A Tertiary Study

Arxiv

0+阅读 · 2022年11月17日

Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion Models

Arxiv

0+阅读 · 2022年11月17日

Execution-based Evaluation for Data Science Code Generation Models

Arxiv

0+阅读 · 2022年11月17日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Sequential Scenario-Specific Meta Learner for Online Recommendation

Sequential Scenario-Specific Meta Learner for Online Recommendation

Arxiv

16+阅读 · 2019年6月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

去躁分数匹配

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

《生态建模密码破译：建模与编程实践》美陆军最新报告

《战争形态演变：合成兵种防御主导模式探析》48页slides

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Normalizing Flow with Variational Latent Representation

Arxiv

0+阅读 · 2022年11月21日

CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation

Arxiv

0+阅读 · 2022年11月21日

Weighted Ensemble Self-Supervised Learning

Arxiv

0+阅读 · 2022年11月18日

Validation Diagnostics for SBI algorithms based on Normalizing Flows

Arxiv

0+阅读 · 2022年11月17日

Machine Learning for Software Engineering: A Tertiary Study

Arxiv

0+阅读 · 2022年11月17日

Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion Models

Arxiv

0+阅读 · 2022年11月17日

Execution-based Evaluation for Data Science Code Generation Models

Arxiv

0+阅读 · 2022年11月17日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Sequential Scenario-Specific Meta Learner for Online Recommendation

Sequential Scenario-Specific Meta Learner for Online Recommendation

Arxiv

16+阅读 · 2019年6月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

计算模拟的方法研究嵌段共聚物在选择性溶剂中的胶束形成过程

国家自然科学基金

0+阅读 · 2015年12月31日

溶剂热法FeSe基超导材料制备和物性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于云计算的普适人体传感网关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

镉低积累甘蓝型油菜的筛选及表达谱分析

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

基于四氢噻唑-2-硫酮-4-羧酸的新型手性有机小分子和离子液体催化剂的合成及应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

稀土离子强化乙二醇类多元醇—乙二胺水体系捕集燃煤烟气中CO2的研究

国家自然科学基金

0+阅读 · 2011年12月31日

桉叶油的热力学基础数据及精馏提纯过程模拟

国家自然科学基金

0+阅读 · 2009年12月31日

Mather理论与Hamilton系统的不稳定性

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员