This paper studies a multi-armed bandit problem where the decision-maker is loss averse, in particular she is risk averse in the domain of gains and risk loving in the domain of losses. The focus is on large horizons. Consequences of loss aversion for asymptotic (large horizon) properties are derived in a number of analytical results. The analysis is based on a new central limit theorem for a set of measures under which conditional variances can vary in a largely unstructured history-dependent way subject only to the restriction that they lie in a fixed interval.
翻译:本文研究一个多臂土匪问题,即决策者不愿损失,特别是她在损益和损失领域的风险方面不愿承担风险,重点是大地平线,从若干分析结果中推断出对无药可救(大地平线)特性的损失厌恶的后果。分析基于一套衡量标准的新中心界限理论,根据这套标准,有条件差异可以以基本上没有结构的、依赖历史的方式变化,只受固定间隔的限制。