We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment. The desirability of the outcome distribution resulting from the policy recommendation is measured through a functional capturing the distributional characteristic that the decision maker is interested in optimizing. This could be, e.g., its inherent inequality, welfare, level of poverty or its distance to a desired outcome distribution. If the functional of interest is not quasi-convex or if there are constraints, the optimal recommendation may be a mixture of treatments. This vastly expands the set of recommendations that must be considered. We characterize the difficulty of the problem by obtaining maximal expected regret lower bounds. Furthermore, we propose two regret-optimal policies. The first policy is static and thus applicable irrespectively of subjects arriving sequentially or not in the course of the experimentation phase. The second policy can utilize that subjects arrive sequentially by successively eliminating inferior treatments and thus spends the sampling effort where it is most needed.
翻译:我们研究决策者的问题,决策者必须提供基于实验的最佳治疗建议。政策建议产生的结果分配的可取性是通过功能性地捕捉决策者感兴趣的最佳分配特征来衡量的。这可以是其固有的不平等、福利、贫困程度或与预期结果分配的距离。如果利害关系不是准曲线或有限制,最佳建议可能是各种治疗的混合。这大大扩大了必须考虑的一套建议。我们通过获得最大预期的低后悔度限制来说明问题的困难。此外,我们提出了两种遗憾最佳政策。第一项政策是静止的,因此不论在试验阶段先后到达或不按部就班的主体,都适用。第二项政策可以按顺序利用这些主题,连续地消除低级治疗,从而在最需要的地方进行取样。