联邦学习基于干净和攻击场景的多语言emoji预测 (Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios)

Federated learning is a growing field in the machine learning community due to its decentralized and private design. Model training in federated learning is distributed over multiple clients giving access to lots of client data while maintaining privacy. Then, a server aggregates the training done on these multiple clients without access to their data, which could be emojis widely used in any social media service and instant messaging platforms to express users' sentiments. This paper proposes federated learning-based multilingual emoji prediction in both clean and attack scenarios. Emoji prediction data have been crawled from both Twitter and SemEval emoji datasets. This data is used to train and evaluate different transformer model sizes including a sparsely activated transformer with either the assumption of clean data in all clients or poisoned data via label flipping attack in some clients. Experimental results on these models show that federated learning in either clean or attacked scenarios performs similarly to centralized training in multilingual emoji prediction on seen and unseen languages under different data sources and distributions. Our trained transformers perform better than other techniques on the SemEval emoji dataset in addition to the privacy as well as distributed benefits of federated learning.

翻译：联邦学习是机器学习社区中不断发展的领域，由于其分散和私密的设计而备受瞩目。在联邦学习中，模型训练分布在多个客户端上，提供大量客户端数据的同时保持隐私。然后，服务器汇总这些多个客户端的培训，而没有访问其数据，这些数据可能是用于表达用户情感的表情符号（emoji），这在任何社交媒体服务和即时通讯平台上都广泛使用。本文提出了基于联邦学习的多语言 emoji 预测，包括在干净和攻击场景下。Emoji预测数据已从Twitter和 SemEval emoji 数据集中爬取。这些数据用于训练和评估不同的 Transformer 模型大小，包括使用干净数据或部分客户端中的标签翻转攻击的稀疏激活 Transformer。这些模型上的实验结果表明，在干净或被攻击的情况下，联邦学习在看到和没看到的语言中，在不同的数据来源和分布下，表现类似于中心化训练的多语言 emoji 预测。我们训练的 Transformer 在 SemEval emoji 数据集上的效果比其他技术更好，同时具有联邦学习的隐私和分布式优势。

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日