Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) $N$ can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting on only a small subset of $K \ll N$ arms within each time period, which poses a problem for traditional Thompson sampling. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called "arm budget constraint". TSEC makes use of a Bayesian interaction model with effect hierarchy priors, to model correlations between rewards on different arms. This fitted model is then integrated within Thompson sampling, to jointly identify a good subset of arms for experimentation and to allocate resources over these arms. We demonstrate the effectiveness of TSEC in two problems with arm budget constraints. The first is a simulated website optimization study, where TSEC shows noticeable improvements over industry benchmarks. The second is a portfolio optimization application on industry-based exchange-traded funds, where TSEC provides more consistent and greater wealth accumulation over standard investment strategies.
翻译:汤普森取样是一种解决多武装土匪问题的流行算法,已经应用于从网站设计到组合优化等广泛应用,从网站设计到组合优化。但是,在这类应用中,选择(或武器)的金额可能很大,而作出适应性决定所需的数据需要昂贵的实验。然后,每个时间段内只能试验少量的一小块一毛一毛军火,给传统的汤普森取样造成问题。我们提议在实验性制约下采用新的汤普森取样法,解决所谓的“武器预算限制 ” 。 贸易安全委员会利用贝叶斯互动模式与前级效应的相互作用模式,模拟不同武器奖励的相互关系。然后,这种合适的模式被纳入汤普森取样,共同确定用于试验的一小块军火,并分配这些武器的资源。我们展示了贸易安全委员会在两个武器预算限制问题中的有效性。我们首先模拟了网站优化研究,其中,贸易安全委员会展示了工业基准的显著改进。第二,是对基于行业的交易所基金进行组合优化应用,其投资提供更加一致和更加标准的积累战略。