私人定制提前调整,以更好地匹配联邦学习中的用户数据 (Privately Customizing Prefinetuning to Better Match User Data in Federated Learning) - 专知论文

会员服务 ·

0

Better · 数据集 · Learning · 联邦学习 · MoDELS ·

2023 年 2 月 17 日

Privately Customizing Prefinetuning to Better Match User Data in Federated Learning

翻译：私人定制提前调整,以更好地匹配联邦学习中的用户数据

Charlie Hou,Hongyuan Zhan,Akshat Shrivastava,Sid Wang,Sasha Livshits,Giulia Fanti,Daniel Lazar

In Federated Learning (FL), accessing private client data incurs communication and privacy costs. As a result, FL deployments commonly prefinetune pretrained foundation models on a (large, possibly public) dataset that is held by the central server; they then FL-finetune the model on a private, federated dataset held by clients. Evaluating prefinetuning dataset quality reliably and privately is therefore of high importance. To this end, we propose FreD (Federated Private Fr\'echet Distance) -- a privately computed distance between a prefinetuning dataset and federated datasets. Intuitively, it privately computes and compares a Fr\'echet distance between embeddings generated by a large language model on both the central (public) dataset and the federated private client data. To make this computation privacy-preserving, we use distributed, differentially-private mean and covariance estimators. We show empirically that FreD accurately predicts the best prefinetuning dataset at minimal privacy cost. Altogether, using FreD we demonstrate a proof-of-concept for a new approach in private FL training: (1) customize a prefinetuning dataset to better match user data (2) prefinetune (3) perform FL-finetuning.

翻译：在联邦学习联盟(FL)中,获取私人客户数据需要通信和隐私费用。因此,FL通常在中央服务器所持有的(大的、可能公开的)数据集上部署预先训练的基础模型;然后是FL-Finnet在客户所持有的私人、联合的数据集上安装模型。因此,评估预先调整数据元件的可靠和私下质量非常重要。为此,我们提议FreD(Freeded Private Fr\'echet Leater) -- -- 预先调整数据集和联邦数据集之间的私算距离。它直观地进行私人计算,比较中央(公共)数据集和联邦私人客户数据中大型语言模型产生的嵌入点之间的Fr\echet距离。为了进行这一计算,我们使用了分布式、差别私人平均值和差异性估计器。我们从经验上表明,FreD准确预测了以最低隐私成本为最佳预调数据集。我们用FreD系统演示了一种更好的私制用户校准前数据测试方法。我们用FFFFficure a refficreduction a reflaction a regiltal pract dalction a dalact dalction a dalction:

0

相关内容

Better

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

最新《联邦学习Federated Learning》报告，47页ppt

最新《联邦学习Federated Learning》报告，47页ppt

专知

45+阅读 · 2020年12月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

Importin 5介导DAPK1核转位致胶质瘤发生的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

碳纳米管TSV建模、热特性及电磁特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

EV71病毒感染介导Sam68调控PI3K/AKT信号通路的分子机制

国家自然科学基金

1+阅读 · 2013年12月31日

共掺杂Y2O3/Eu3+纳米材料的高压研究

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

哮喘中Notch1对T细胞分化作用的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于体外循环冠脉搭桥术中冷灌缺血心脏筛选人心肌早期缺血损伤特异生物标记的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios

Arxiv

0+阅读 · 2023年4月10日

Efficient Secure Aggregation for Privacy-Preserving Federated Machine Learning

Arxiv

1+阅读 · 2023年4月7日

UniASM: Binary Code Similarity Detection without Fine-tuning

Arxiv

0+阅读 · 2023年4月6日

BlockDFL: A Blockchain-based Fully Decentralized Federated Learning Framework

Arxiv

0+阅读 · 2023年4月6日

PrivGraph: Differentially Private Graph Data Publication by Exploiting Community Information

Arxiv

0+阅读 · 2023年4月5日

FedBot: Enhancing Privacy in Chatbots with Federated Learning

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

FL-Market: Trading Private Models in Federated Learning

Arxiv

0+阅读 · 2023年4月3日

Learning to Generate Image Embeddings with User-level Differential Privacy

Arxiv

0+阅读 · 2023年3月31日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML2025】QuRe：通过困难负样本采样实现查询相关的组合图像检索

自动驾驶中的3D目标检测研究进展

中文版 | 无人机战争与乌克兰战场演进（2024-2025）

【阿姆斯特丹博士论文】在嘈杂和低资源环境中提升神经检索器的鲁棒性与有效性

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

最新《联邦学习Federated Learning》报告，47页ppt

最新《联邦学习Federated Learning》报告，47页ppt

专知

45+阅读 · 2020年12月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

相关论文

Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios

Arxiv

0+阅读 · 2023年4月10日

Efficient Secure Aggregation for Privacy-Preserving Federated Machine Learning

Arxiv

1+阅读 · 2023年4月7日

UniASM: Binary Code Similarity Detection without Fine-tuning

Arxiv

0+阅读 · 2023年4月6日

BlockDFL: A Blockchain-based Fully Decentralized Federated Learning Framework

Arxiv

0+阅读 · 2023年4月6日

PrivGraph: Differentially Private Graph Data Publication by Exploiting Community Information

Arxiv

0+阅读 · 2023年4月5日

FedBot: Enhancing Privacy in Chatbots with Federated Learning

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

FL-Market: Trading Private Models in Federated Learning

Arxiv

0+阅读 · 2023年4月3日

Learning to Generate Image Embeddings with User-level Differential Privacy

Arxiv

0+阅读 · 2023年3月31日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

相关基金

Importin 5介导DAPK1核转位致胶质瘤发生的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

碳纳米管TSV建模、热特性及电磁特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

EV71病毒感染介导Sam68调控PI3K/AKT信号通路的分子机制

国家自然科学基金

1+阅读 · 2013年12月31日

共掺杂Y2O3/Eu3+纳米材料的高压研究

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

哮喘中Notch1对T细胞分化作用的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于体外循环冠脉搭桥术中冷灌缺血心脏筛选人心肌早期缺血损伤特异生物标记的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员