Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.
翻译:检测恶意行为是安全的一个根本问题。在实际使用检测系统时,主要挑战之一是处理由正常行为(所谓的假阳性)引发的大量警报(即所谓的假正反正),而实际恶意活动却掩盖了警报。虽然提出了减少这一问题范围的许多方法,但最终还是必须决定如何确定调查警示的优先顺序,而大多数现行优先排序方法基于可疑或优先分数等,是超常的。我们采用一种新颖的方法,利用对称强化学习来计算预警优先排序的政策。我们的方法假设攻击者了解检测系统的全面状态,并动态地选择最佳攻击作为这一状态的功能,以及预警的优先排序政策。我们的方法的第一步是在游戏理论模型中捕捉辩护人和攻击者之间的互动。为了解决这一游戏的计算复杂性,以获得动态的对称预警优先排序政策,我们提议了一个对抗性强化学习框架。在这个框架内,我们使用神经强化学习如何为对稳健防系统的最佳应对政策,并动态地选择最佳攻击作为这一状态的功能,同时选择了预警优先度政策的政策。我们的方法的第一步是捕捉捉到游戏的防御性模型模型模型模型,然后在游戏中,我们使用双向另一个的对称的对称的对称的对称的对称,在游戏的对称中,在对称中,我们使用一个对称的对称的对称的对称的对称的对称中,在对称的对称的对称的对称的对称的对称的对称的对称是使用一种对称的对称的对称的对称的对称的对称的对称的对称式的对称性研究中,在对称的对称性研究中,在对称的对称的对称式的对称是使用对称式的对称的对称式的对称的对称的对称的对称式的对称式的对称性研究。在对称的对称的对称性研究。在对称中,在对称中,在对称中,在对称是使用对称中,在对称的对称的对称的对称式的对称性研究中,在对称的对称是使用对称的对称式的对称式的对称式的对称是使用对称性研究中,在对称性