解释性规划战略的自动发现 (Automatic Discovery of Interpretable Planning Strategies)

from arxiv, Submitted to the Special Issue on Reinforcement Learning for Real Life in Machine Learning Journal (2021). Code available at https://github.com/RationalityEnhancement/InterpretableStrategyDiscovery

When making decisions, people often overlook critical information or are overly swayed by irrelevant information. A common approach to mitigate these biases is to provide decision-makers, especially professionals such as medical doctors, with decision aids, such as decision trees and flowcharts. Designing effective decision aids is a difficult problem. We propose that recently developed reinforcement learning methods for discovering clever heuristics for good decision-making can be partially leveraged to assist human experts in this design process. One of the biggest remaining obstacles to leveraging the aforementioned methods is that the policies they learn are opaque to people. To solve this problem, we introduce AI-Interpret: a general method for transforming idiosyncratic policies into simple and interpretable descriptions. Our algorithm combines recent advances in imitation learning and program induction with a new clustering method for identifying a large subset of demonstrations that can be accurately described by a simple, high-performing decision rule. We evaluate our new algorithm and employ it to translate information-acquisition policies discovered through metalevel reinforcement learning. The results of large behavioral experiments showed that prividing the decision rules generated by AI-Interpret as flowcharts significantly improved people's planning strategies and decisions across three diferent classes of sequential decision problems. Moreover, another experiment revealed that this approach is significantly more effective than training people by giving them performance feedback. Finally, a series of ablation studies confirmed that AI-Interpret is critical to the discovery of interpretable decision rules. We conclude that the methods and findings presented herein are an important step towards leveraging automatic strategy discovery to improve human decision-making.

翻译：决策时,人们往往忽视关键信息,或者过于偏重于不相关信息。缓解这些偏差的共同办法是向决策者,特别是医生等专业人员提供决策辅助工具,例如决策树和流程图。设计有效的决策辅助工具是一个难题。我们提议,最近开发的强化学习方法可以部分地帮助人类专家在设计过程中发现明智的决策偏差。利用上述方法的最大障碍之一是他们学习的政策对人们不透明。为了解决这个问题,我们引入了AI-解释:将典型的发现政策转变为简单易解的描述的一般方法。我们的算法将模仿学习和编程的最近进展与新的组合方法结合起来,以确定大量可以通过简单、高绩效的决策规则准确描述的示范。我们评估我们的新算法并使用它来翻译通过元级强化学习发现的信息获取政策。大规模的行为实验的结果表明,将AI-解释性发现的决策规则作为流动和可解释的描述性描述。我们的算法结合了模仿学习和编程程序的新组合方法,从而大大地改进了人类决策的正确性战略,最后通过连续的演算法来解释,从而大大地改进了对决策的顺序分析,从而揭示了另一个决策的周期性研究结论。