使用Q-学习系统对反反向攻击进行强力和机器人的马拉威探测系统 (Robust Android Malware Detection System against Adversarial Attacks using Q-Learning)

The current state-of-the-art Android malware detection systems are based on machine learning and deep learning models. Despite having superior performance, these models are susceptible to adversarial attacks. Therefore in this paper, we developed eight Android malware detection models based on machine learning and deep neural network and investigated their robustness against adversarial attacks. For this purpose, we created new variants of malware using Reinforcement Learning, which will be misclassified as benign by the existing Android malware detection models. We propose two novel attack strategies, namely single policy attack and multiple policy attack using reinforcement learning for white-box and grey-box scenario respectively. Putting ourselves in the adversary's shoes, we designed adversarial attacks on the detection models with the goal of maximizing fooling rate, while making minimum modifications to the Android application and ensuring that the app's functionality and behavior do not change. We achieved an average fooling rate of 44.21% and 53.20% across all the eight detection models with a maximum of five modifications using a single policy attack and multiple policy attack, respectively. The highest fooling rate of 86.09% with five changes was attained against the decision tree-based model using the multiple policy approach. Finally, we propose an adversarial defense strategy that reduces the average fooling rate by threefold to 15.22% against a single policy attack, thereby increasing the robustness of the detection models i.e. the proposed model can effectively detect variants (metamorphic) of malware. The experimental analysis shows that our proposed Android malware detection system using reinforcement learning is more robust against adversarial attacks.

翻译：因此,我们根据机器学习和深层神经网络开发了八种安卓恶意软件检测模型,并调查了这些模型对对抗性攻击的稳健性。为此,我们用强化学习制造了新的恶意软件变体,将现有的安卓恶意软件检测模型错误地分类为良性。我们提出了两种新型攻击战略,即单项政策攻击和多重政策攻击,分别使用白箱和灰盒情景强化学习。把自己放在对手鞋子中,我们设计了八种安卓恶意软件检测模型,目的是最大限度地提高愚弄率,同时对安卓应用程序进行最低限度的修改,并确保应用程序的功能和行为不会改变。我们在所有八种检测模型中实现了平均愚弄率44.21%和53.20%,提议采用单项政策攻击和多重政策攻击最多五次修改。将86.09 % 最高愚昧性攻击率放在对手的鞋子中,目的是最大限度地提高愚弄率,同时确保安卓应用安卓应用软件的功能和行为变化。我们最后用一个平均的惯性防御性攻击性攻击策略模型来降低平均15度。我们用一个平均的惯性攻击性攻击性攻击性攻击性攻击性攻击性政策模型,从而降低了一种平均的惯性攻击性战略。我们用了多重的惯性攻击性攻击率。