平衡是关键 : 私人中中位分割产生高功用随机树 (Balance is key: Private median splits yield high-utility random trees) - 专知论文

会员服务 ·

0

中位数 · 随机森林 · 叶结点 · 可辨认的 · 代价 ·

2021 年 2 月 20 日

Balance is key: Private median splits yield high-utility random trees

翻译：平衡是关键 : 私人中中位分割产生高功用随机树

Shorya Consul,Sinead A. Williamson

from arxiv, 17 pages

Random forests are a popular method for classification and regression due to their versatility. However, this flexibility can come at the cost of user privacy, since training random forests requires multiple data queries, often on small, identifiable subsets of the training data. Privatizing these queries typically comes at a high utility cost, in large part because we are privatizing queries on small subsets of the data, which are easily corrupted by added noise. In this paper, we propose DiPriMe forests, a novel tree-based ensemble method for differentially private regression and classification, which is appropriate for real or categorical covariates. We generate splits using a differentially private version of the median, which encourages balanced leaf nodes. By avoiding low occupancy leaf nodes, we avoid high signal-to-noise ratios when privatizing the leaf node sufficient statistics. We show theoretically and empirically that the resulting algorithm exhibits high utility, while ensuring differential privacy.

翻译：随机森林因其多功能性而是一种常用的分类和回归方法。然而,这种灵活性可能以用户隐私为代价,因为培训随机森林需要多种数据查询,往往对培训数据中小的、可识别的子集进行数据查询。将这些查询私有化通常需要很高的水电费,这在很大程度上是因为我们正在将关于数据中小子集的查询私有化,这些数据很容易因增加的噪音而腐蚀。在本文中,我们提议Diprime森林,这是一种基于树的新型的、以差异为基础的私人回归和分类共通方法,适合真实或绝对的共变式。我们利用中位的有差异的私人版本产生分裂,这鼓励平衡的叶节点。通过避免低占用叶节点,我们避免在将叶节私有化时出现高信号到噪音比率。我们从理论上和从经验上表明,由此产生的算法具有很高的效用,同时确保不同的隐私。

0

相关内容

中位数

最新《机器学习数学基础》书册，109页pdf

最新《机器学习数学基础》书册，109页pdf

专知会员服务

80+阅读 · 2021年2月7日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

时空序列预测方法综述

时空序列预测方法综述

专知会员服务

170+阅读 · 2020年10月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Generalization Properties of Learning with Random Features

Arxiv

0+阅读 · 2021年4月15日

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

Arxiv

0+阅读 · 2021年4月14日

Sharp phase transitions for exact support recovery under local differential privacy

Arxiv

0+阅读 · 2021年4月14日

A second order ensemble method based on a blended BDF timestepping scheme for time dependent Navier-Stokes equations

Arxiv

0+阅读 · 2021年4月14日

Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics

Arxiv

0+阅读 · 2021年4月13日

Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition

Arxiv

0+阅读 · 2021年4月13日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Arxiv

5+阅读 · 2020年12月21日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

VIP会员

文章信息

相关主题

相关VIP内容

最新《机器学习数学基础》书册，109页pdf

最新《机器学习数学基础》书册，109页pdf

专知会员服务

80+阅读 · 2021年2月7日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

时空序列预测方法综述

时空序列预测方法综述

专知会员服务

170+阅读 · 2020年10月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Generalization Properties of Learning with Random Features

Arxiv

0+阅读 · 2021年4月15日

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

Arxiv

0+阅读 · 2021年4月14日

Sharp phase transitions for exact support recovery under local differential privacy

Arxiv

0+阅读 · 2021年4月14日

A second order ensemble method based on a blended BDF timestepping scheme for time dependent Navier-Stokes equations

Arxiv

0+阅读 · 2021年4月14日

Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics

Arxiv

0+阅读 · 2021年4月13日

Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition

Arxiv

0+阅读 · 2021年4月13日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Arxiv

5+阅读 · 2020年12月21日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

微信扫码咨询专知VIP会员