In adversarial environments, one side could gain an advantage by identifying the opponent's strategy. For example, in combat games, if an opponents strategy is identified as overly aggressive, one could lay a trap that exploits the opponent's aggressive nature. However, an opponent's strategy is not always apparent and may need to be estimated from observations of their actions. This paper proposes to use inverse reinforcement learning (IRL) to identify strategies in adversarial environments. Specifically, the contributions of this work are 1) the demonstration of this concept on gaming combat data generated from three pre-defined strategies and 2) the framework for using IRL to achieve strategy identification. The numerical experiments demonstrate that the recovered rewards can be identified using a variety of techniques. In this paper, the recovered reward are visually displayed, clustered using unsupervised learning, and classified using a supervised learner.
翻译:在对抗环境中,一方可以通过确定对手的战略而获得优势。例如,在战斗游戏中,如果确定对手的战略过于侵略性,一方可以设置一个利用对手攻击性的陷阱;然而,对手的战略并不总是显而易见,可能需要根据对对方行动观察来估计。本文提议使用反强化学习(IRL)来确定对抗环境中的战略。具体地说,这项工作的贡献是:(1) 展示关于对三项预先确定的战略所产生的战斗数据进行赌博的概念;(2) 利用IRL确定战略的框架。数字实验表明,利用各种技术可以确定收回的奖励。在本文中,回收的奖励是视觉显示的,使用不受监督的学习进行分组,并使用受监督的学习者进行分类。