联邦2Vec:利用联邦学习来鼓励协作代表学习 (Federated Word2Vec: Leveraging Federated Learning to Encourage Collaborative Representation Learning)

Large scale contextual representation models have significantly advanced NLP in recent years, understanding the semantics of text to a degree never seen before. However, they need to process large amounts of data to achieve high-quality results. Joining and accessing all these data from multiple sources can be extremely challenging due to privacy and regulatory reasons. Federated Learning can solve these limitations by training models in a distributed fashion, taking advantage of the hardware of the devices that generate the data. We show the viability of training NLP models, specifically Word2Vec, with the Federated Learning protocol. In particular, we focus on a scenario in which a small number of organizations each hold a relatively large corpus. The results show that neither the quality of the results nor the convergence time in Federated Word2Vec deteriorates as compared to centralised Word2Vec.

翻译：近年来,大型背景代表模式大大推进了国家学习计划,理解了文本的语义,达到了前所未有的程度,然而,它们需要处理大量数据,以取得高质量的成果。由于隐私和监管原因,从多种来源合并和获取所有这些数据可能极具挑战性。联邦学习联盟可以利用生成数据的设备的硬件,以分布方式通过培训模式解决这些局限性。我们展示了培训国家学习计划模式的可行性,特别是Word2Vec,与联邦学习协议相结合。特别是,我们侧重于少数组织各自拥有相对大体的情景。结果显示,与中央化的Word2Vec相比,联邦Word2Vec系统的结果质量和趋同时间都没有恶化。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

60+阅读 · 2021年4月24日