In this paper, we propose a first formalization of the process of exploitation of SQL injection vulnerabilities. We consider a simplification of the dynamics of SQL injection attacks by casting this problem as a security capture-the-flag challenge. We model it as a Markov decision process, and we implement it as a reinforcement learning problem. We then deploy different reinforcement learning agents tasked with learning an effective policy to perform SQL injection; we design our training in such a way that the agent learns not just a specific strategy to solve an individual challenge but a more generic policy that may be applied to perform SQL injection attacks against any system instantiated randomly by our problem generator. We analyze the results in terms of the quality of the learned policy and in terms of convergence time as a function of the complexity of the challenge and the learning agent's complexity. Our work fits in the wider research on the development of intelligent agents for autonomous penetration testing and white-hat hacking, and our results aim to contribute to understanding the potential and the limits of reinforcement learning in a security environment.
翻译:在本文中,我们建议首先正式确定利用SQL注射弱点的过程。我们考虑简化SQL注射攻击的动态,将这一问题作为安全捕获的难题。我们把它作为马尔科夫决定过程的模型,我们将其作为强化学习问题加以实施。我们然后部署不同的强化学习代理,负责学习有效的政策,以实施SQL注射;我们设计我们的训练方式,使该代理不仅学习解决个人挑战的具体战略,而且学习一种更通用的政策,可以用来对问题生成者随机随机对任何系统进行SQL注射攻击。我们从所学政策的质量以及融合时间的角度分析结果,作为挑战的复杂性和学习代理的复杂性的函数。我们的工作符合关于开发智能代理进行自主渗透测试和白帽子黑客的更广泛研究,我们的成果旨在帮助了解在安全环境中强化学习的潜力和局限性。