化学数据审查来降低双重使用风险 (Censoring chemical data to mitigate dual use risk) - 专知论文

会员服务 ·

0

方差 · 多层感知机 · 数据集 · 神经网络 · 感知机 ·

2023 年 4 月 20 日

Censoring chemical data to mitigate dual use risk

翻译：化学数据审查来降低双重使用风险

Quintina L. Campbell,Jonathan Herington,Andrew D. White

The dual use of machine learning applications, where models can be used for both beneficial and malicious purposes, presents a significant challenge. This has recently become a particular concern in chemistry, where chemical datasets containing sensitive labels (e.g. toxicological information) could be used to develop predictive models that identify novel toxins or chemical warfare agents. To mitigate dual use risks, we propose a model-agnostic method of selectively noising datasets while preserving the utility of the data for training deep neural networks in a beneficial region. We evaluate the effectiveness of the proposed method across least squares, a multilayer perceptron, and a graph neural network. Our findings show selectively noised datasets can induce model variance and bias in predictions for sensitive labels with control, suggesting the safe sharing of datasets containing sensitive information is feasible. We also find omitting sensitive data often increases model variance sufficiently to mitigate dual use. This work is proposed as a foundation for future research on enabling more secure and collaborative data sharing practices and safer machine learning applications in chemistry.

翻译：机器学习应用的双重使用（模型可用于有益和恶意目的）为人工智能领域提出了重要挑战。在化学领域，化学数据集包含敏感标签（例如，毒理信息），可能被用来开发预测模型，以识别新型毒素或化学战剂，因此最近成为特别关注的问题之一。我们提出了一种模型无关的方法，通过有选择的加噪来保护训练深度神经网络的数据在有益区域内所具有的信息无损的同时降低双重使用风险。我们评估了所提出方法在最小二乘法、多层感知机和图形神经网络上的有效性。我们的研究结果表明，有选择的加噪可以增加模型方差和偏差，并导致控制下敏感标签预测的方差增加，因此数据集的保护共享是可行的；我们发现忽略敏感数据往往使模型的方差增加足以降低双重使用风险。这项研究为未来研究提供了基础，能够促进更加安全和协作性的数据共享实践以及更加安全的化学机器学习应用。

0

相关内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

专知会员服务

29+阅读 · 2022年8月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

专知会员服务

27+阅读 · 2020年3月26日

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

专知会员服务

20+阅读 · 2020年1月7日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

12+阅读 · 2018年1月27日

【数据集】新的YELP数据集官方下载

【数据集】新的YELP数据集官方下载

机器学习研究会

16+阅读 · 2017年8月31日

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

电化学分析体系用于牛乳腺炎的凋亡诱导自我保护机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于众包的数据清洗关键技术研究

国家自然科学基金

5+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

遥感云服务平台中海量影像数据完整性证明研究

国家自然科学基金

0+阅读 · 2013年12月31日

设备无关类量子随机数研究

国家自然科学基金

0+阅读 · 2013年12月31日

用于纵向组学数据统计分析的GEE-TGDR算法的开发和应用

国家自然科学基金

1+阅读 · 2013年12月31日

新型贵金属原子簇/纳米结构修饰电极制备、压电电化学研究及分析应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于链接化学策略的石墨烯多功能化、多维成像及分子识别

国家自然科学基金

0+阅读 · 2011年12月31日

Physics Inspired Approaches To Understanding Gaussian Processes

Arxiv

0+阅读 · 2023年6月6日

Distribution-Free Matrix Prediction Under Arbitrary Missing Pattern

Arxiv

0+阅读 · 2023年6月6日

Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Arxiv

0+阅读 · 2023年6月3日

Inferring Private Personal Attributes of Virtual Reality Users from Head and Hand Motion Data

Inferring Private Personal Attributes of Virtual Reality Users from Head and Hand Motion Data

Arxiv

0+阅读 · 2023年6月2日

MLP-Mixer as a Wide and Sparse MLP

Arxiv

0+阅读 · 2023年6月2日

FREPA: An Automated and Formal Approach to Requirement Modeling and Analysis in Aircraft Control Domain

Arxiv

0+阅读 · 2023年6月2日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Geometric Graph Convolutional Neural Networks

Geometric Graph Convolutional Neural Networks

Arxiv

10+阅读 · 2019年9月11日

DAGCN: Dual Attention Graph Convolutional Networks

Arxiv

16+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

多层感知机

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

【KDD2022】弱监督图神经网络：标签结构联合预测解决数据缺失问题

专知会员服务

29+阅读 · 2022年8月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

专知会员服务

27+阅读 · 2020年3月26日

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

专知会员服务

20+阅读 · 2020年1月7日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

12+阅读 · 2018年1月27日

【数据集】新的YELP数据集官方下载

【数据集】新的YELP数据集官方下载

机器学习研究会

16+阅读 · 2017年8月31日

相关论文

Physics Inspired Approaches To Understanding Gaussian Processes

Arxiv

0+阅读 · 2023年6月6日

Distribution-Free Matrix Prediction Under Arbitrary Missing Pattern

Arxiv

0+阅读 · 2023年6月6日

Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Arxiv

0+阅读 · 2023年6月3日

Inferring Private Personal Attributes of Virtual Reality Users from Head and Hand Motion Data

Inferring Private Personal Attributes of Virtual Reality Users from Head and Hand Motion Data

Arxiv

0+阅读 · 2023年6月2日

MLP-Mixer as a Wide and Sparse MLP

Arxiv

0+阅读 · 2023年6月2日

FREPA: An Automated and Formal Approach to Requirement Modeling and Analysis in Aircraft Control Domain

Arxiv

0+阅读 · 2023年6月2日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Geometric Graph Convolutional Neural Networks

Geometric Graph Convolutional Neural Networks

Arxiv

10+阅读 · 2019年9月11日

DAGCN: Dual Attention Graph Convolutional Networks

Arxiv

16+阅读 · 2019年4月4日

相关基金

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

电化学分析体系用于牛乳腺炎的凋亡诱导自我保护机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于众包的数据清洗关键技术研究

国家自然科学基金

5+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

遥感云服务平台中海量影像数据完整性证明研究

国家自然科学基金

0+阅读 · 2013年12月31日

设备无关类量子随机数研究

国家自然科学基金

0+阅读 · 2013年12月31日

用于纵向组学数据统计分析的GEE-TGDR算法的开发和应用

国家自然科学基金

1+阅读 · 2013年12月31日

新型贵金属原子簇/纳米结构修饰电极制备、压电电化学研究及分析应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于链接化学策略的石墨烯多功能化、多维成像及分子识别

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员