Meta-training agents with memory has been shown to culminate in Bayes-optimal agents, which casts Bayes-optimality as the implicit solution to a numerical optimization problem rather than an explicit modeling assumption. Bayes-optimal agents are risk-neutral, since they solely attune to the expected return, and ambiguity-neutral, since they act in new situations as if the uncertainty were known. This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge. Humans are also known to be averse to ambiguity and sensitive to risk in ways that aren't Bayes-optimal, indicating that such sensitivity can confer advantages, especially in safety-critical situations. How can we extend the meta-learning protocol to generate risk- and ambiguity-sensitive agents? The goal of this work is to fill this gap in the literature by showing that risk- and ambiguity-sensitivity also emerge as the result of an optimization problem using modified meta-training algorithms, which manipulate the experience-generation process of the learner. We empirically test our proposed meta-training algorithms on agents exposed to foundational classes of decision-making experiments and demonstrate that they become sensitive to risk and ambiguity.
翻译:具有记忆力的元培训剂被证明最终是贝亚-最佳物剂,这使贝亚-最佳物剂成为数字优化问题的隐含解决办法,而不是明确的模型假设。 贝亚-最佳物剂是风险中性,因为它们完全适应预期的回报,而模棱两可,因为它们在新的形势下采取行动,仿佛已经知道不确定因素。这与风险敏感物剂形成对照,后者还利用返回的更高级时段,以及模棱两可的物剂,这些物剂在认识他们缺乏知识的情况时作用不同。 人类还知道他们不愿意以非贝亚-最佳的方式模糊和敏感地对待风险,表明这种敏感性可以带来优势,特别是在安全危急的情况下。我们如何扩大元学习协议以产生风险和模棱两可的物剂? 这项工作的目标是通过表明风险和模糊性敏感性也是由于使用经修改的元培训算法问题而出现优化的结果,从而操纵学习者的经验生成过程,表明这种敏感性可以带来优势,特别是在安全危急的情况下。 我们通过实验性试验测试我们提出的基因培训,从而显示他们进入敏感的实验基础。