基于安全多党计算法的快速隐私保留文本分类 (Fast Privacy-Preserving Text Classification based on Secure Multiparty Computation)

We propose a privacy-preserving Naive Bayes classifier and apply it to the problem of private text classification. In this setting, a party (Alice) holds a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will only learn the result of the classifier applied to her text input and Bob learns nothing. Our solution is based on Secure Multiparty Computation (SMC). Our Rust implementation provides a fast and secure solution for the classification of unstructured text. Applying our solution to the case of spam detection (the solution is generic, and can be used in any other scenario in which the Naive Bayes classifier can be employed), we can classify an SMS as spam or ham in less than 340ms in the case where the dictionary size of Bob's model includes all words (n = 5200) and Alice's SMS has at most m = 160 unigrams. In the case with n = 369 and m = 8 (the average of a spam SMS in the database), our solution takes only 21ms.

翻译：我们提议了一个保护隐私的Naive Bayes分类器, 并将其应用于私人文本分类问题。在此设置中, 政党( Alice) 持有文本信息, 而另一政党( Bob) 持有分类器。在协议结束时, 爱丽丝将只学习分类器应用到其文本输入和 Bob 学习不到任何东西的结果。我们的解决方案基于安全多党计算( SMC ) 。我们的执行为非结构文本的分类提供了快速和安全的解决方案。应用我们的方法处理垃圾邮件检测( 解决方案是通用的, 可以在使用 Naive Bayes 分类器的任何其他情况下使用 ), 我们可以将 SMS 分类为垃圾邮件或火腿, 在Bob 模型的字典大小包含所有单词( n= 5200) 和 Alice 的 SMS SMS 最多为 m = 160 ungram 。在 n = 369 和 m = 8 ( 数据库中垃圾邮件SMS 的平均值) 的情况下, 我们的解决方案仅需要21米。

相关内容

朴素贝叶斯分类器

关注 4

在机器学习中，朴素贝叶斯分类器是一系列以假设特征之间强（朴素）独立下运用贝叶斯定理为基础的简单概率分类器。朴素贝叶斯自20世纪50年代已广泛研究。在20世纪60年代初就以另外一个名称引入到文本信息检索界中，并仍然是文本分类的一种热门（基准）方法，文本分类是以词频为特征判断文件所属类别或其他（如垃圾邮件、合法性、体育或政治等等）的问题。通过适当的预处理，它可以与这个领域更先进的方法（包括支持向量机）相竞争。它在自动医疗诊断中也有应用

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日