We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
翻译:我们提出了一个系统,用于对学到的安全政策进行互动审查,使用户能够以有控制的方式通过马尔科夫决策程序,跟踪安全政策引发的行动。与软件调试器一样,用户可以随时继续或停止一个事件,检查参数和感兴趣的概率分布。该系统有助于深入了解特定政策的结构和边缘情况的政策行为。我们用一个网络入侵案件来演示这个系统。我们研究了信息技术基础设施的演变情况,以及袭击发生时安全政策规定的行动。示范政策是通过强化学习方法取得的,其中包括一个模拟系统,在模拟中逐步学习政策,以及产生驱动模拟运行的统计数据的模拟系统。