Individual human decision-makers may benefit from different forms of support to improve decision outcomes. However, a key question is which form of support will lead to accurate decisions at a low cost. In this work, we propose learning a decision support policy that, for a given input, chooses which form of support, if any, to provide. We consider decision-makers for whom we have no prior information and formalize learning their respective policies as a multi-objective optimization problem that trades off accuracy and cost. Using techniques from stochastic contextual bandits, we propose $\texttt{THREAD}$, an online algorithm to personalize a decision support policy for each decision-maker, and devise a hyper-parameter tuning strategy to identify a cost-performance trade-off using simulated human behavior. We provide computational experiments to demonstrate the benefits of $\texttt{THREAD}$ compared to offline baselines. We then introduce $\texttt{Modiste}$, an interactive tool that provides $\texttt{THREAD}$ with an interface. We conduct human subject experiments to show how $\texttt{Modiste}$ learns policies personalized to each decision-maker and discuss the nuances of learning decision support policies online for real users.
翻译:个体人类决策者可能会受益于不同形式的决策支持以提高决策成果。但是,一个关键问题是哪种形式的支持将导致低成本的准确决策。在这项工作中,我们提出了学习决策支持策略的方法,该策略针对给定的输入选择是否提供支持。我们考虑了我们没有先前信息的决策者,并将其各自的策略学习形式化为一个多目标优化问题,以平衡准确性和成本。使用随机上下文型赌博机的技术,我们提出了一种在线算法 THREAD,用于个性化决策支持策略,设计了一个超参数调整策略,以使用模拟人类行为识别成本 - 性能权衡。我们提供计算实验来展示 THREAD 相对于离线基线的好处。然后介绍一个交互式工具 Modiste,为 THREAD 提供一个界面。我们进行人类主体实验,展示了如何学习针对每个决策者的个性化策略,并讨论在线学习真实用户决策支持策略的细微差别。