To make good decisions in the real world people need efficient planning strategies because their computational resources are limited. Knowing which planning strategies would work best for people in different situations would be very useful for understanding and improving human decision-making. But our ability to compute those strategies used to be limited to very small and very simple planning tasks. To overcome this computational bottleneck, we introduce a cognitively-inspired reinforcement learning method that can overcome this limitation by exploiting the hierarchical structure of human behavior. The basic idea is to decompose sequential decision problems into two sub-problems: setting a goal and planning how to achieve it. This hierarchical decomposition enables us to discover optimal strategies for human planning in larger and more complex tasks than was previously possible. The discovered strategies outperform existing planning algorithms and achieve a super-human level of computational efficiency. We demonstrate that teaching people to use those strategies significantly improves their performance in sequential decision-making tasks that require planning up to eight steps ahead. By contrast, none of the previous approaches was able to improve human performance on these problems. These findings suggest that our cognitively-informed approach makes it possible to leverage reinforcement learning to improve human decision-making in complex sequential decision-problems. Future work can leverage our method to develop decision support systems that improve human decision making in the real world.
翻译:为了在现实世界中做出正确的决策,人们需要有效的规划战略,因为他们的计算资源有限。知道哪些规划战略对不同情况下的人最有效,对于理解和改善人类决策非常有用。但是,我们计算这些战略的能力仅限于非常小和非常简单的规划任务。为了克服这一计算瓶颈,我们引入了一种由认知启发的强化学习方法,通过利用人类行为的等级结构来克服这一局限性。基本想法是将顺序决策问题分解为两个小问题:设定目标和规划如何实现这些目标。这种等级分化使我们能够发现人类规划的最佳战略,其规模比以前可能做的更大和更加复杂的任务。所发现的战略优于现有的规划算法,并达到计算效率的超人水平。我们证明,教育人们使用这些战略可以大大改善他们在需要提前规划的顺序决策任务中的绩效。相比之下,以前的办法中没有一个能够改善人类在这些问题上的业绩。这些结论表明,我们明智的分明化方法使我们有可能在比以前可能做的更大和更加复杂的任务中找到最佳的人类规划战略。所发现的战略优于现有规划算法,并达到计算效率的超人的水平。我们证明,用这些战略可以大大地改进人类决策系统。