We consider a decision maker who must choose an action in order to maximize a reward function that depends also on an unknown parameter {\Theta}. The decision maker can delay taking the action in order to experiment and gather additional information on {\Theta}. We model the decision maker's problem using a Bayesian sequential experimentation framework and use dynamic programming and diffusion-asymptotic analysis to solve it. For that, we scale our problem in a way that both the average number of experiments that is conducted per unit of time is large and the informativeness of each individual experiment is low. Under such regime, we derive a diffusion approximation for the sequential experimentation problem, which provides a number of important insights about the nature of the problem and its solution. Our solution method also shows that the complexity of the problem grows only quadratically with the cardinality of the set of actions from which the decision maker can choose. We illustrate our methodology and results using a concrete application in the context of assortment selection and new product introduction. Specifically, we study the problem of a seller who wants to select an optimal assortment of products to launch into the marketplace and is uncertain about consumers' preferences. Motivated by emerging practices in e-commerce, we assume that the seller is able to use a crowdvoting system to learn these preferences before a final assortment decision is made. In this context, we undertake an extensive numerical analysis to assess the value of learning and demonstrate the effectiveness and robustness of the heuristics derived from the diffusion approximation.
翻译:我们考虑一个决策者,他必须选择一个行动,以便最大限度地增加奖励功能,而奖励功能也取决于未知的参数 {Theta}。决策者可以推迟采取行动,以便试验和收集关于 {theta} 的额外信息。我们用巴伊西亚顺序实验框架来模拟决策者的问题,并使用动态编程和传播-无损分析来解决该问题。为此,我们将问题的规模扩大,使每个时间单位平均进行实验的次数大,而每个实验的普及性低。在这种制度下,我们为连续实验问题得出一个扩散近似,从而对问题的性质及其解决办法提供一些重要的见解。我们的解决办法还表明,问题的复杂性仅随着决策者可以选择的一系列行动的根本性而增长。我们用一种具体应用的方法和结果来说明我们的方法和结果。具体地说,我们研究一个问题,一个想选择最优化的产物的传播性推算方法,然后从大量产品推算,然后从我们开始的精确度分析,然后开始一个不确定,从我们开始的指数分析,然后开始,从我们开始,然后开始一种不确定,我们开始,我们进行这种精确的精确的排序。