The ongoing rise in cyberattacks and the lack of skilled professionals in the cybersecurity domain to combat these attacks show the need for automated tools capable of detecting an attack with good performance. Attackers disguise their actions and launch attacks that consist of multiple actions, which are difficult to detect. Therefore, improving defensive tools requires their calibration against a well-trained attacker. In this work, we propose a model of an attacking agent and environment and evaluate its performance using basic Q-Learning, Naive Q-learning, and DoubleQ-Learning, all of which are variants of Q-Learning. The attacking agent is trained with the goal of exfiltrating data whereby all the hosts in the network have a non-zero detection probability. Results show that the DoubleQ-Learning agent has the best overall performance rate by successfully achieving the goal in $70\%$ of the interactions.
翻译:网络攻击不断上升,网络安全领域缺乏有技能的专业人员来打击这些攻击,这表明需要有能够以良好性能探测攻击的自动化工具。攻击者掩盖其行动和发动由多种行动组成的攻击,这些行动是难以探测的。因此,改进防御工具需要针对受过良好训练的攻击者进行校准。在这项工作中,我们提出了一个攻击物剂和环境模型,并利用基本的Q-学习、Nive Q-学习和双重学习来评价其表现,所有这些都是Q-学习的变种。攻击者接受的训练目标是提取数据,使网络中的所有东道主都有非零探测概率。结果显示,双Q-学习者通过成功达到70美元的互动目标,拥有最佳的总体性能。