We propose a new diffusion-asymptotic analysis for sequentially randomized experiments. Rather than taking sample size $n$ to infinity while keeping the problem parameters fixed, we let the mean signal level scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ gets large. In this regime, we show that the behavior of a class of methods for sequential experimentation converges to a diffusion limit. This connection enables us to make sharp performance predictions and obtain new insights on the behavior of Thompson sampling. Our diffusion asymptotics also help resolve a discrepancy between the $\Theta(\log(n))$ regret predicted by the fixed-parameter, large-sample asymptotics on the one hand, and the $\Theta(\sqrt{n})$ regret from worst-case, finite-sample analysis on the other, suggesting that it is an appropriate asymptotic regime for understanding practical large-scale sequential experiments.
翻译:我们建议对按顺序随机进行的实验进行新的扩散- 防患于未然的分析。 我们不是在固定问题参数的同时将样本大小从一美元到无限,而是将平均信号级别比值放在一美元/ sqrt{n}的顺序上,以便保持学习任务的困难,因为一美元大,在这个制度下,我们表明,一系列顺序实验方法的行为会与扩散限度相趋一致。 这个联系使我们能够作出敏锐的性能预测,并获得关于汤普森取样行为的新洞察力。 我们的分散性也有助于解决固定参数预测的美元(log(n))和大模量序列试验以及美元(sqrt{n})对最坏情况、有限抽样分析的遗憾之间出现差异。 这表明,对于了解实际大规模连续实验来说,这是一个适当的系统。