评估、可视化和改进合成数据的使用 (Assessing, visualizing and improving the utility of synthetic data) - 专知论文

会员服务 ·

0

原点 · MoDELS · 泛函 · 得分 · 类别 ·

2021 年 9 月 26 日

Assessing, visualizing and improving the utility of synthetic data

翻译：评估、可视化和改进合成数据的使用

Gillian M Raab,Beata Nowok,Chris Dibben

from arxiv, main text and references 13. Four appendices on pages 14-19. Four figures

A number of measures have been proposed to assess the utility of the synthetic data. These include measures based on distances between the two distributions and others based on combining the original and synthetic data and predicting the origin with a propensity score. The methods will be reviewed and compared, and relations between them illustrated. These measures are incorporated into utility modules in the \pkg{synthpop} package that include methods to visualize the results. We illustrate how to compare diffent syntheses and to diagnose which aspect of the synthetic data differs from the original. The utility functions were originally designed to be used for synthetic data objects of class synds, created by synthpop, but they can now be used to compare synthetic data created by other methods with the original records. The utility measures can be standardized by their expected Null distributions from a correct synthesis model. If they are used to evaluate other types of altered data, not generated from a model, then this standardisation can be interpreted as giving the ratio of the difference for the original to the expected stochastic error.

翻译：为了评估合成数据的效用,提出了一些措施建议来评估合成数据的效用,其中包括基于将原始和合成数据合并并用偏差来预测来源的原始和合成数据与其它数据之间的距离的措施。将审查和比较这些方法,并说明它们之间的关系。这些措施被纳入了\pkg{synthpop}软件包中的实用模块,其中包括对结果进行可视化的方法。我们说明了如何比较混杂的合成综合数据,并诊断合成数据中哪些方面与原始数据不同。这些实用功能最初设计用于合成分类符号的合成数据对象,由合成棒生成,但现在可以用来将其他方法生成的合成数据与原始记录进行比较。这些实用措施可以通过预期的合成模型的Null分布标准化。如果使用它们来评价其他类型的已变数据,而不是从模型中生成的,那么这种标准化可以解释为将原始数据与预期的随机错误的差比。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

Arxiv

0+阅读 · 2021年11月18日

Novel Features for Time Series Analysis: A Complex Networks Approach

Novel Features for Time Series Analysis: A Complex Networks Approach

Arxiv

0+阅读 · 2021年11月17日

Online Estimation and Optimization of Utility-Based Shortfall Risk

Arxiv

0+阅读 · 2021年11月16日

r-local sensing: Improved algorithm and applications

Arxiv

0+阅读 · 2021年11月16日

Interpretable and Fair Boolean Rule Sets via Column Generation

Arxiv

0+阅读 · 2021年11月16日

On the Strategyproofness of the Geometric Median

Arxiv

0+阅读 · 2021年11月16日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT

Arxiv

7+阅读 · 2019年10月28日

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Arxiv

3+阅读 · 2018年7月19日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《太空边缘（临近空间）的武器化？军事高空平台的进展与前景》

《利用星基增强系统（SBAS）信号进行射频干扰（RFI）检测与特征分析》

美陆军在“艾布拉姆斯”坦克与“布拉德利”步战车上测试“牛蛙”反无人机炮塔

《军事领域特性及其对军事人工智能应用的影响》

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

Arxiv

0+阅读 · 2021年11月18日

Novel Features for Time Series Analysis: A Complex Networks Approach

Novel Features for Time Series Analysis: A Complex Networks Approach

Arxiv

0+阅读 · 2021年11月17日

Online Estimation and Optimization of Utility-Based Shortfall Risk

Arxiv

0+阅读 · 2021年11月16日

r-local sensing: Improved algorithm and applications

Arxiv

0+阅读 · 2021年11月16日

Interpretable and Fair Boolean Rule Sets via Column Generation

Arxiv

0+阅读 · 2021年11月16日

On the Strategyproofness of the Geometric Median

Arxiv

0+阅读 · 2021年11月16日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT

Arxiv

7+阅读 · 2019年10月28日

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Arxiv

3+阅读 · 2018年7月19日

微信扫码咨询专知VIP会员