更少是更多:一个使用Feded Learning的尊重隐私和机器人恶意软件分类器 (Less is More: A privacy-respecting Android malware classifier using Federated Learning)

In this paper we present LiM ("Less is More"), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users' devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in >100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with MaMaDroid's dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.

翻译：在本文中,我们展示了Lim (“ Less is More” ), 这是一种恶意软件分类框架, 使Federal Learning 能够以尊重隐私的方式检测和分类恶意应用程序。有关新安装的应用程序的信息保存在本地用户设备上, 使供应商无法推断用户安装了哪些应用程序。与此同时, 所有用户的投入都被纳入了联盟式学习过程, 并且他们都受益于更好的分类性能。这个环境的主要挑战在于用户无法获取地面真相( 即他们无法正确识别应用程序是否恶意 ) 。为了解决这个问题, LiM 使用一个安全的半超过的组合, 使服务供应商( 即云层) 所培训的基准分类器的分类准确度最大化。我们实施LimM, 并显示云服务器的F1得分为95%, 而客户在 > 100 应用程序中只记得一个错误的正数, 使用25K 清洁应用程序和 25K 恶意应用程序的数据集, 200 用户和 50 联邦制50 。此外, 我们用一个安全的半机密性攻击者, 进行一个可靠的安全性攻击, 并且用平心的服务器测试, 进行进行一个可靠的安全性攻击, 性攻击。

相关内容

联邦学习

关注 200

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

首篇「课程学习（Curriculum Learning)」2021综述论文

专知会员服务

50+阅读 · 2021年1月31日