We propose a privacy-preserving Naive Bayes classifier and apply it to the problem of private text classification. In this setting, a party (Alice) holds a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will only learn the result of the classifier applied to her text input and Bob learns nothing. Our solution is based on Secure Multiparty Computation (SMC). Our Rust implementation provides a fast and secure solution for the classification of unstructured text. Applying our solution to the case of spam detection (the solution is generic, and can be used in any other scenario in which the Naive Bayes classifier can be employed), we can classify an SMS as spam or ham in less than 340ms in the case where the dictionary size of Bob's model includes all words (n = 5200) and Alice's SMS has at most m = 160 unigrams. In the case with n = 369 and m = 8 (the average of a spam SMS in the database), our solution takes only 21ms.
翻译:我们提议了一个保护隐私的Naive Bayes分类器, 并将其应用于私人文本分类问题。 在此设置中, 政党( Alice) 持有文本信息, 而另一政党( Bob) 持有分类器。 在协议结束时, 爱丽丝将只学习分类器应用到其文本输入和 Bob 学习不到任何东西的结果。 我们的解决方案基于安全多党计算( SMC ) 。 我们的执行为非结构文本的分类提供了快速和安全的解决方案。 应用我们的方法处理垃圾邮件检测( 解决方案是通用的, 可以在使用 Naive Bayes 分类器的任何其他情况下使用 ), 我们可以将 SMS 分类为垃圾邮件或火腿, 在Bob 模型的字典大小包含所有单词( n= 5200) 和 Alice 的 SMS SMS 最多为 m = 160 ungram 。 在 n = 369 和 m = 8 ( 数据库中垃圾邮件SMS 的平均值) 的情况下, 我们的解决方案仅需要21米 。