The main objective of this paper is to outline a theoretical framework to characterise humans' decision-making strategies under uncertainty, in particular active learning in a black-box optimization task and trading-off between information gathering (exploration) and reward seeking (exploitation). Humans' decisions making according to these two objectives can be modelled in terms of Pareto rationality. If a decision set contains a Pareto efficient strategy, a rational decision maker should always select the dominant strategy over its dominated alternatives. A distance from the Pareto frontier determines whether a choice is Pareto rational. To collect data about humans' strategies we have used a gaming application that shows the game field, with previous decisions and observations, as well as the score obtained. The key element in this paper is the representation of behavioural patterns of human learners as a discrete probability distribution. This maps the problem of the characterization of humans' behaviour into a space whose elements are probability distributions structured by a distance between histograms, namely the Wasserstein distance (WST). The distributional analysis gives new insights about human search strategies and their deviations from Pareto rationality. Since the uncertainty is one of the two objectives defining the Pareto frontier, the analysis has been performed for three different uncertainty quantification measures to identify which better explains the Pareto compliant behavioural patterns. Beside the analysis of individual patterns WST has also enabled a global analysis computing the barycenters and WST k-means clustering. A further analysis has been performed by a decision tree to relate non-Paretian behaviour, characterized by exasperated exploitation, to the dynamics of the evolution of the reward seeking process.
翻译:本文的主要目的是勾画一个理论框架,说明人类在不确定情况下的决策战略,特别是黑盒优化任务中的积极学习,以及信息收集(探索)和追求奖励(开发)之间的交易(开发)之间的交易。根据这两个目标,人类的决策可以模仿Pareto理性。如果一个决策集包含Pareto高效战略,理性决策者应该总是选择其主导战略。与Pareto边界的距离决定一个选择是否合理。为了收集人类战略的数据,我们使用了显示游戏场、先前的决定和观察以及得分的组合应用。本文的关键内容是显示人类学习者的行为模式作为离散的概率分布。如果一个决策集包含一个Pareto高效战略,那么理性决策者应该总是选择一个空间的特征分配问题,而该空间的概率分布是由其直观之间的距离,即Wester距离(WST)来决定。分布分析提供了人类搜索战略及其与Pareto-Stainal 的偏差, 分析也通过一个不确定性和精确性分析来更好地解释对PareST进行。