生成模型的基于匹配的数据估价 (Matching-based Data Valuation for Generative Model) - 专知论文

会员服务 ·

0

生成模型 · 判别模型 · 深度生成模型 · 性能指标 · 后处理 ·

2023 年 4 月 21 日

Matching-based Data Valuation for Generative Model

翻译：生成模型的基于匹配的数据估价

Jiaxi Yang,Wenglong Deng,Benlin Liu,Yangsibo Huang,Xiaoxiao Li

Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.

翻译：数据估价对机器学习至关重要，它有助于增强模型的透明度并保护数据的特性。现有的数据估价方法主要集中在判别模型上，忽略了近来备受关注的深度生成模型。与判别模型类似，迫切需要评估深度生成模型中的数据贡献。然而，以前的数据估价方法主要依赖于判别模型的性能指标，并需要模型重训练。因此，它们不能直接和高效地应用于最近的深度生成模型，如生成对抗网络和扩散模型。为了填补这一空白，我们从一个相似-匹配的视角来形式化生成模型中的数据估价问题。具体而言，我们介绍了 GMValuator，这是第一个针对任何生成模型的模型无关方法，旨在为生成任务提供数据估价。我们进行了大量实验来证明所提出方法的有效性。据我们所知，GMValuator 是第一个提供仅依赖生成模型输出即可进行后处理的数据估价策略。

0

相关内容

生成模型

在机器学习中，生成模型可以用来直接对数据建模（例如根据某个变量的概率密度函数进行数据采样），也可以用来建立变量间的条件概率分布。条件概率分布可以由生成模型根据贝叶斯定理形成。

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

「基于联邦学习的推荐系统」最新2022研究综述

「基于联邦学习的推荐系统」最新2022研究综述

专知会员服务

75+阅读 · 2022年5月21日

【ICLR 2022】MIT论文解读：谈到人工智能，我们可以抛弃数据集吗？基于ML创建合成数据，Generative Models As A Data Source For Multiview Representation Learning

【ICLR 2022】MIT论文解读：谈到人工智能，我们可以抛弃数据集吗？基于ML创建合成数据，Generative Models As A Data Source For Multiview Representation Learning

专知会员服务

41+阅读 · 2022年3月15日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

专知会员服务

19+阅读 · 2022年2月2日

【ICLR2021】基于动态正则化的联邦学习

专知会员服务

42+阅读 · 2021年1月18日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于结构性保护的城市历史地段社区公共空间模式及适应性研究——以苏州为例

国家自然科学基金

0+阅读 · 2015年12月31日

云存储中隐私保护密文数据检索算法的研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向隐私保护的云数据访问模型与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于模型的安全关键的信息物理融合系统的设计方法中的软件综合

国家自然科学基金

1+阅读 · 2014年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

云存储的数据自主访问控制与数据完整性盲审计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

情感信息抽取的资源建设及关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于图数据的在线社交网络隐私无泄露信息发布研究

国家自然科学基金

2+阅读 · 2011年12月31日

骨干网环境下基于访问行为及其无指导学习模型的网络失窃密异常检测

国家自然科学基金

0+阅读 · 2010年12月31日

面向个性化搜索中的隐私保护关键技术研究

国家自然科学基金

2+阅读 · 2009年12月31日

GPT Self-Supervision for a Better Data Annotator

Arxiv

0+阅读 · 2023年6月7日

Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis

Arxiv

0+阅读 · 2023年6月6日

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Arxiv

0+阅读 · 2023年6月5日

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Arxiv

0+阅读 · 2023年6月4日

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

Arxiv

0+阅读 · 2023年6月3日

Matching-based Data Valuation for Generative Model

Arxiv

0+阅读 · 2023年6月2日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

VIP会员

文章信息

相关主题

深度生成模型

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

「基于联邦学习的推荐系统」最新2022研究综述

「基于联邦学习的推荐系统」最新2022研究综述

专知会员服务

75+阅读 · 2022年5月21日

【ICLR 2022】MIT论文解读：谈到人工智能，我们可以抛弃数据集吗？基于ML创建合成数据，Generative Models As A Data Source For Multiview Representation Learning

【ICLR 2022】MIT论文解读：谈到人工智能，我们可以抛弃数据集吗？基于ML创建合成数据，Generative Models As A Data Source For Multiview Representation Learning

专知会员服务

41+阅读 · 2022年3月15日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

专知会员服务

19+阅读 · 2022年2月2日

【ICLR2021】基于动态正则化的联邦学习

专知会员服务

42+阅读 · 2021年1月18日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

GPT Self-Supervision for a Better Data Annotator

Arxiv

0+阅读 · 2023年6月7日

Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis

Arxiv

0+阅读 · 2023年6月6日

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Arxiv

0+阅读 · 2023年6月5日

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Arxiv

0+阅读 · 2023年6月4日

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

Arxiv

0+阅读 · 2023年6月3日

Matching-based Data Valuation for Generative Model

Arxiv

0+阅读 · 2023年6月2日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

相关基金

基于结构性保护的城市历史地段社区公共空间模式及适应性研究——以苏州为例

国家自然科学基金

0+阅读 · 2015年12月31日

云存储中隐私保护密文数据检索算法的研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向隐私保护的云数据访问模型与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于模型的安全关键的信息物理融合系统的设计方法中的软件综合

国家自然科学基金

1+阅读 · 2014年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

云存储的数据自主访问控制与数据完整性盲审计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

情感信息抽取的资源建设及关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于图数据的在线社交网络隐私无泄露信息发布研究

国家自然科学基金

2+阅读 · 2011年12月31日

骨干网环境下基于访问行为及其无指导学习模型的网络失窃密异常检测

国家自然科学基金

0+阅读 · 2010年12月31日

面向个性化搜索中的隐私保护关键技术研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员