Social media and other platforms rely on automated detection of abusive content to help combat disinformation, harassment, and abuse. One common approach is to check user content for similarity against a server-side database of problematic items. However, this method fundamentally endangers user privacy. Instead, we target client-side detection, notifying only the users when such matches occur to warn them against abusive content. Our solution is based on privacy-preserving similarity testing. Existing approaches rely on expensive cryptographic protocols that do not scale well to large databases and may sacrifice the correctness of the matching. To contend with this challenge, we propose and formalize the concept of similarity-based bucketization~(SBB). With SBB, a client reveals a small amount of information to a database-holding server so that it can generate a bucket of potentially similar items. The bucket is small enough for efficient application of privacy-preserving protocols for similarity. To analyze the privacy risk of the revealed information, we introduce a framework for measuring an adversary's confidence in inferring a predicate about the client input correctly. We develop a practical SBB protocol for image content, and evaluate its client privacy guarantee with real-world social media data. We then combine SBB with various similarity protocols, showing that the combination with SBB provides a speedup of at least 29x on large-scale databases compared to that without, while retaining correctness of over 95%.
翻译:社交媒体和其他平台依靠自动检测滥用内容来帮助打击虚假信息、骚扰和虐待。一个共同的方法是对照一个有问题的服务器端数据库来检查用户内容的相似性。但是,这种方法从根本上危及用户隐私。相反,我们瞄准客户端的检测,只在出现匹配时通知用户,以警告他们防止滥用内容。我们的解决方案基于隐私保护的类似性测试。现有方法依赖于昂贵的加密协议,这些协议规模不及大型数据库,可能牺牲匹配的正确性。为了应对这一挑战,我们提议并正式确定基于类似性的桶化~(SBB)概念。一个客户向一个持有数据库的服务器披露少量信息,以便生成一个可能相似的桶。这个桶小到足以有效应用隐私保护协议的类似性内容。为了分析披露信息的隐私风险,我们引入了一个框架,用以测量对手对客户投入的准确度的信心。为了应对这一挑战,我们开发了一个实用的SBB协议用于图像内容,并评估其客户隐私性为最小量的信息,以便生成一桶可能相似的一桶,同时将SBBS-B的大型协议与S-B格式与S-B相比,并显示一个真实性的比例。