In this paper, we propose a formalization of the process of exploitation of SQL injection vulnerabilities. We consider a simplification of the dynamics of SQL injection attacks by casting this problem as a security capture-the-flag challenge. We model it as a Markov decision process, and we implement it as a reinforcement learning problem. We then deploy reinforcement learning agents tasked with learning an effective policy to perform SQL injection; we design our training in such a way that the agent learns not just a specific strategy to solve an individual challenge but a more generic policy that may be applied to perform SQL injection attacks against any system instantiated randomly by our problem generator. We analyze the results in terms of the quality of the learned policy and in terms of convergence time as a function of the complexity of the challenge and the learning agent's complexity. Our work fits in the wider research on the development of intelligent agents for autonomous penetration testing and white-hat hacking, and our results aim to contribute to understanding the potential and the limits of reinforcement learning in a security environment.
翻译:在本文中,我们建议将利用SQL注射弱点的过程正规化。我们考虑简化SQL注射攻击的动态,将这一问题作为安全捕获的难题。我们把它作为马尔科夫决定程序的模型,我们将其作为强化学习问题加以执行。我们然后部署强化学习人员,负责学习有效的政策,以实施SQL注射;我们设计我们的训练方式,使该人员不仅学会解决个人挑战的具体战略,而且学会一种更通用的政策,可以用来对问题产生者随机抽出的任何系统进行SQL注射攻击。我们从所学政策的质量以及融合时间的角度分析其结果,作为挑战复杂性和学习代理人复杂性的函数。我们的工作符合关于开发智能剂进行自主渗透测试和白帽子黑客的更广泛研究,我们的成果旨在帮助了解在安全环境中强化学习的潜力和局限性。