Model extraction attack is one of the most prominent adversarial techniques to target machine learning models along with membership inference attack and model inversion attack. On the other hand, Explainable Artificial Intelligence (XAI) is a set of techniques and procedures to explain the decision making process behind AI. XAI is a great tool to understand the reasoning behind AI models but the data provided for such revelation creates security and privacy vulnerabilities. In this poster, we propose AUTOLYCUS, a model extraction attack that exploits the explanations provided by LIME to infer the decision boundaries of decision tree models and create extracted surrogate models that behave similar to a target model.
翻译:模型抽取攻击是针对机器学习模型以及成员推论攻击和模型反向攻击的最突出的对抗技术之一;另一方面,可解释人工智能(XAI)是解释AI背后决策过程的一套技术和程序;XAI是了解AI模型背后推理的一个极好的工具,但为这种披露提供的数据造成了安全和隐私脆弱性。在这个海报中,我们建议AUTOLYCUS(AUTOLYCUS)是利用LIME提供的解释来推断决定树模型的决定界限和创建类似于目标模型的提取代金模型的模型。