The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models for solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzzer, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzzer forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzzer decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzzer is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzzer to significantly enhance their robustness without sacrificing accuracy.
翻译:Markov 决策程序(MDP)为模拟连续决策问题提供了一个数学框架,其中许多问题对安保和安全至关重要,如自主驾驶和机器人控制。人工智能研究的迅速发展创造了解决MDP的有效方法,如深神经网络(DNNS)、强化学习(RL)和仿制学习(IL)。然而,这些解决MDP的流行模式既没有经过彻底测试,也没有严格可靠。我们提出了MDPFuzzer(MDPFZzzer),这是解决MDP模型的第一个黑盒模糊测试框架。MDPFuzzer(MDPFuzzer) 表格测试或触雷器,检查目标模型是否进入异常和危险状态。在模糊过程中,MDPFuzzzer(MPzzer)决定通过测量变异状态来保持状态,如果它能够减少累积奖励或形成新的状态序列。我们设计了高效的技术,用高血压混合模型(GMMMDR)来量化状态的“亮度 ” 和动态模型(DFD-RD),我们用来评估每个州和多式的快速模型。