Over the last years, topic modeling has emerged as a powerful technique for organizing and summarizing big collections of documents or searching for particular patterns in them. However, privacy concerns arise when cross-analyzing data from different sources is required. Federated topic modeling solves this issue by allowing multiple parties to jointly train a topic model without sharing their data. While several federated approximations of classical topic models do exist, no research has been carried out on their application for neural topic models. To fill this gap, we propose and analyze a federated implementation based on state-of-the-art neural topic modeling implementations, showing its benefits when there is a diversity of topics across the nodes' documents and the need to build a joint model. Our approach is by construction theoretically and in practice equivalent to a centralized approach but preserves the privacy of the nodes.
翻译:过去几年来,主题建模已成为组织和总结大量文件汇编或查找其中特定模式的有力技术,然而,当需要不同来源的交叉分析数据时,隐私问题便出现; 联邦专题建模允许多个当事方联合培训专题模型而无需分享数据,从而解决这一问题; 虽然一些传统专题模型的联邦近似确实存在,但尚未对其神经专题模型的应用进行研究; 为了填补这一空白,我们提议并分析基于最新神经专题建模的实施联合实施, 表明当节点文件有多种专题并需要建立一个联合模型时, 其好处是: 我们的方法是在理论上和实践上等同于集中方法,但保留节点的隐私。