NeuraCrypt:通过公共培训随机神经网络隐藏私人健康数据 (NeuraCrypt: Hiding Private Health Data via Random Neural Networks for Public Training)

Balancing the needs of data privacy and predictive utility is a central challenge for machine learning in healthcare. In particular, privacy concerns have led to a dearth of public datasets, complicated the construction of multi-hospital cohorts and limited the utilization of external machine learning resources. To remedy this, new methods are required to enable data owners, such as hospitals, to share their datasets publicly, while preserving both patient privacy and modeling utility. We propose NeuraCrypt, a private encoding scheme based on random deep neural networks. NeuraCrypt encodes raw patient data using a randomly constructed neural network known only to the data-owner, and publishes both the encoded data and associated labels publicly. From a theoretical perspective, we demonstrate that sampling from a sufficiently rich family of encoding functions offers a well-defined and meaningful notion of privacy against a computationally unbounded adversary with full knowledge of the underlying data-distribution. We propose to approximate this family of encoding functions through random deep neural networks. Empirically, we demonstrate the robustness of our encoding to a suite of adversarial attacks and show that NeuraCrypt achieves competitive accuracy to non-private baselines on a variety of x-ray tasks. Moreover, we demonstrate that multiple hospitals, using independent private encoders, can collaborate to train improved x-ray models. Finally, we release a challenge dataset to encourage the development of new attacks on NeuraCrypt.

翻译：平衡数据隐私和预测效用的需要是保健领域机器学习的一个中心挑战。特别是,隐私问题导致公共数据集缺乏,使多医院组群的建设复杂化,限制了外部机器学习资源的利用。为此,需要采用新方法使数据所有人,如医院能够公开分享数据集,同时维护患者隐私和建模功能。我们提议NeuraCrypt,一个基于随机的深空神经网络的私人编码系统。NeuraCrypt使用一个仅为数据拥有者所知的随机构建的神经网络编码原始病人数据,并公开公布编码数据和相关标签。从理论角度看,我们表明,从足够丰富的编码功能组进行取样,可以提供一个定义明确和有意义的隐私概念,与完全了解基本数据分布的计算式对立。我们提议通过随机的深空的神经网络来将编码功能对这个组进行匹配。我们展示了我们的编码对一套非对抗性攻击的内脏网络的坚固性,并公开公布了编码数据和相关标签。从理论角度,我们证明,从足够丰富的编码组群的编码中取样,可以实现一种竞争性的保密性基准,我们通过多级的实验室来展示了一种更精确的实验室。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日