Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.
翻译:在偷猎者、非法伐木者和非法渔民的对抗行为不确定的情况下,计划巡逻的绿色安全领域的维权者是那些计划巡逻的维权者。重要的是,巡逻对对手未来行为的威慑效应使得巡逻规划成为下一个顺序决策问题。因此,我们注重按照小型最大遗憾标准(文献中未考虑过这一标准)进行强有力的连续巡逻规划绿色安全。我们把这个问题描述为控制对抗行为参数值的维权者和自然之间的游戏,并设计一个算法MIRROR,以找到一个强有力的政策。MIRROR使用两个强化学习型的手腕,并在考虑到有限的维权者战略和参数值的情况下解决一个有限的游戏。我们用真实世界偷猎数据来评价MIRROR。