AI模型除垢：方法与选择 (AI Model Disgorgement: Methods and Choices) - 专知论文

会员服务 ·

0

语料库 · 语料 · ML · 使用数据 · 高保真 ·

2023 年 4 月 7 日

AI Model Disgorgement: Methods and Choices

翻译：AI模型除垢：方法与选择

Alessandro Achille,Michael Kearns,Carson Klingenberg,Stefano Soatto

Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement -- the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.

翻译：数据的负责使用是任何机器学习(ML)实现不可或缺的部分。ML开发人员必须仔细收集和筛选数据集，并记录其来源。他们还必须确保尊重知识产权，确保个人隐私，并以道德方式使用数据。过去几年中，ML模型的规模和复杂性显着增加。这些模型需要大量的数据和计算能力来训练，以至于培训语料库中的任何缺陷都不会轻易通过从头重新训练模型来纠正。尽管对训练数据采取了复杂的控制措施，并且已经投入了大量精力确保训练语料库的正确组成，但模型所需的数据量之大使得手动检查训练语料库中的每个数据单元变得具有挑战性。模型除垢是一种可能的修复训练语料库数据缺陷的方法，通过消除不仅错误使用的数据，而且消除它们对任何ML模型组件的影响。模型除垢技术可用于解决各种问题，例如减少偏差或毒性，提高保真度，确保知识产权的负责使用等。在本文中，我们引入了适用于现代ML系统的可能除垢方法的分类法。特别地，我们研究了以不需要从头重新训练为前提的“消除数据影响”的含义。

0

相关内容

语料库

语料库是语料库语言学研究的基础资源，也是经验主义语言研究方法的主要资源。应用于词典编纂，语言教学，传统语言研究，自然语言处理中基于统计或实例的研究等方面。

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知会员服务

89+阅读 · 2020年2月28日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

【KDD2019|讲座推荐】优化群体智能：推理、学习和教学：Optimize the Wisdom of the Crowd: Inference, Learning, and Teaching

【KDD2019|讲座推荐】优化群体智能：推理、学习和教学：Optimize the Wisdom of the Crowd: Inference, Learning, and Teaching

专知会员服务

15+阅读 · 2019年12月11日

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

专知会员服务

18+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

如何用TF Serving部署TensorFlow模型

如何用TF Serving部署TensorFlow模型

AI研习社

26+阅读 · 2019年3月27日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

PinX1基因通过c-Myc抑制胶质瘤增殖的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

调控细胞内Tau蛋白降解的活性分子研究

国家自然科学基金

0+阅读 · 2014年12月31日

p38 MAPK/ATF2信号通路对BACE1表达和Aβ生成的调控作用及其分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于树突细胞行为模型的僵尸程序检测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于"Build-and-Click"法的铂类RNA聚合酶I选择性抑制剂的构建、评价及亚细胞定位研究

国家自然科学基金

1+阅读 · 2013年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

应用转基因斑马鱼研究同源盒基因HOXB4在造血干细胞发育调控中的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

基于构件的软件系统动态演化研究

国家自然科学基金

0+阅读 · 2009年12月31日

大黄鱼甘油三酯脂酶(ATGL)基因表达对脂肪沉积的影响及营养调控的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Incomplete Multimodal Learning for Complex Brain Disorders Prediction

Arxiv

0+阅读 · 2023年5月25日

To Signal or Not to Signal? Layering Traffic Analysis Resistance on Secure Instant Messaging

Arxiv

0+阅读 · 2023年5月25日

MTCue: Learning Zero-Shot Control of Extra-Textual Attributes by Leveraging Unstructured Context in Neural Machine Translation

Arxiv

0+阅读 · 2023年5月25日

On-demand Container Loading in AWS Lambda

Arxiv

0+阅读 · 2023年5月24日

Explainable Deep Learning Methods in Medical Diagnosis: A Survey

Arxiv

35+阅读 · 2022年5月10日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Arxiv

19+阅读 · 2019年1月14日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知会员服务

89+阅读 · 2020年2月28日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

【KDD2019|讲座推荐】优化群体智能：推理、学习和教学：Optimize the Wisdom of the Crowd: Inference, Learning, and Teaching

【KDD2019|讲座推荐】优化群体智能：推理、学习和教学：Optimize the Wisdom of the Crowd: Inference, Learning, and Teaching

专知会员服务

15+阅读 · 2019年12月11日

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

专知会员服务

18+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

如何用TF Serving部署TensorFlow模型

如何用TF Serving部署TensorFlow模型

AI研习社

26+阅读 · 2019年3月27日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

相关论文

Incomplete Multimodal Learning for Complex Brain Disorders Prediction

Arxiv

0+阅读 · 2023年5月25日

To Signal or Not to Signal? Layering Traffic Analysis Resistance on Secure Instant Messaging

Arxiv

0+阅读 · 2023年5月25日

MTCue: Learning Zero-Shot Control of Extra-Textual Attributes by Leveraging Unstructured Context in Neural Machine Translation

Arxiv

0+阅读 · 2023年5月25日

On-demand Container Loading in AWS Lambda

Arxiv

0+阅读 · 2023年5月24日

Explainable Deep Learning Methods in Medical Diagnosis: A Survey

Arxiv

35+阅读 · 2022年5月10日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Arxiv

19+阅读 · 2019年1月14日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

相关基金

PinX1基因通过c-Myc抑制胶质瘤增殖的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

调控细胞内Tau蛋白降解的活性分子研究

国家自然科学基金

0+阅读 · 2014年12月31日

p38 MAPK/ATF2信号通路对BACE1表达和Aβ生成的调控作用及其分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于树突细胞行为模型的僵尸程序检测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于"Build-and-Click"法的铂类RNA聚合酶I选择性抑制剂的构建、评价及亚细胞定位研究

国家自然科学基金

1+阅读 · 2013年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

应用转基因斑马鱼研究同源盒基因HOXB4在造血干细胞发育调控中的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

基于构件的软件系统动态演化研究

国家自然科学基金

0+阅读 · 2009年12月31日

大黄鱼甘油三酯脂酶(ATGL)基因表达对脂肪沉积的影响及营养调控的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员