Mind your weight:关于移动应用中机器学习不足模型保护的大规模研究 (Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps)

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? What impacts can (stolen) models incur? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial and security impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

翻译：安装机器学习( ML) 正在移动应用程序中迅速获得普及。它允许在保存用户隐私的同时进行离线模型推断。但是, ML模型, 被视为模型所有者的核心智力特性, 现在储存在数十亿个不受信任的装置上, 并有可能发生盗窃。泄漏模型可以造成严重的财务损失和安全后果。本文展示了移动设备ML模型保护的首次实验性研究。我们的研究旨在用数量证据回答三个尚未解答的问题: 应用程序中使用的模型保护有多广泛? 现有模型保护技术有多强? 哪些影响( ) 模型保护技术?? 模型可以产生什么影响( ) 模型( ) 模型可以产生什么影响? 为此, 我们建立了一个简单的软件分析管道, 分析了从美国和中国应用程序市场收集的46753个流行软件。我们发现了1 468个ML应用程序可以造成巨大的财务损失和安全后果。我们发现, 41%的ML应用程序根本无法保护自己的模型, 从模型可以被微小地盗取。即使是那些正在使用模型保护或加密的应用程序, 我们能够通过不精细化的动态风险模型从66%的模型提取模型到现在的模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日