In this paper we present LiM ("Less is More"), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users' devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in >100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with MaMaDroid's dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.
翻译:在本文中,我们展示了Lim (“ Less is More” ), 这是一种恶意软件分类框架, 使Federal Learning 能够以尊重隐私的方式检测和分类恶意应用程序。 有关新安装的应用程序的信息保存在本地用户设备上, 使供应商无法推断用户安装了哪些应用程序。 与此同时, 所有用户的投入都被纳入了联盟式学习过程, 并且他们都受益于更好的分类性能。 这个环境的主要挑战在于用户无法获取地面真相( 即他们无法正确识别应用程序是否恶意 ) 。 为了解决这个问题, LiM 使用一个安全的半超过的组合, 使服务供应商( 即云层) 所培训的基准分类器的分类准确度最大化。 我们实施LimM, 并显示云服务器的F1得分为95%, 而客户在 > 100 应用程序中只记得一个错误的正数, 使用25K 清洁应用程序和 25K 恶意应用程序的数据集, 200 用户 和 50 联邦制50 。 此外, 我们用一个安全的半机密性攻击者, 进行一个可靠的安全性攻击, 并且用平心的服务器 测试, 进行 进行一个可靠的 安全性攻击, 性攻击。