On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? What impacts can (stolen) models incur? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial and security impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.
翻译:安装机器学习( ML) 正在移动应用程序中迅速获得普及。 它允许在保存用户隐私的同时进行离线模型推断。 但是, ML模型, 被视为模型所有者的核心智力特性, 现在储存在数十亿个不受信任的装置上, 并有可能发生盗窃。 泄漏模型可以造成严重的财务损失和安全后果。 本文展示了移动设备ML模型保护的首次实验性研究。 我们的研究旨在用数量证据回答三个尚未解答的问题: 应用程序中使用的模型保护有多广泛? 现有模型保护技术有多强? 哪些影响( ) 模型保护技术?? 模型可以产生什么影响( ) 模型( ) 模型可以产生什么影响? 为此, 我们建立了一个简单的软件分析管道, 分析了从美国和中国应用程序市场收集的46753个流行软件。 我们发现了1 468个ML应用程序可以造成巨大的财务损失和安全后果。 我们发现, 41%的ML应用程序根本无法保护自己的模型, 从模型可以被微小地盗取。 即使是那些正在使用模型保护或加密的应用程序, 我们能够通过不精细化的动态风险模型从66%的模型提取模型到现在的模型 。