Thompson sampling is a heuristic algorithm for the multi-armed bandit problem which has a long tradition in machine learning. The algorithm has a Bayesian spirit in the sense that it selects arms based on posterior samples of reward probabilities of each arm. By forging a connection between combinatorial binary bandits and spike-and-slab variable selection, we propose a stochastic optimization approach to subset selection called Thompson Variable Selection (TVS). TVS is a framework for interpretable machine learning which does not rely on the underlying model to be linear. TVS brings together Bayesian reinforcement and machine learning in order to extend the reach of Bayesian subset selection to non-parametric models and large datasets with very many predictors and/or very many observations. Depending on the choice of a reward, TVS can be deployed in offline as well as online setups with streaming data batches. Tailoring multiplay bandits to variable selection, we provide regret bounds without necessarily assuming that the arm mean rewards be unrelated. We show a very strong empirical performance on both simulated and real data. Unlike deterministic optimization methods for spike-and-slab variable selection, the stochastic nature makes TVS less prone to local convergence and thereby more robust.
翻译:汤普森抽样是多武装土匪问题的一种超演算法,在机器学习方面有着悠久的传统。算法具有贝叶斯人的精神,即它根据每一臂的奖励概率的事后抽样选择武器。通过在组合二进制土匪和钉钉和板块变量选择之间建立联系,我们建议对子集选择采用一种随机优化方法,称为汤普森变量选择(TVS)。TVS是一个可解释的机器学习框架,它不依赖基本模型为线性。TVS将贝叶西亚子集成和机器学习结合起来,以便把贝叶西亚子选择的范围扩大到非参数模型和大型数据集,并有许多预测器和(或)观察。根据奖励的选择,TVS可以在离线上部署,以及在线设置流数据批量。将多功能强的土匪与变量选择相匹配,我们提供令人遗憾的界限,而不必假设手臂意味着不相干的报酬。我们在模拟和真实数据上展示了非常强大的实证性表现。与确定性最强的僵化的僵化的电视选择方法不同,因此,稳定和稳定性最不易地将僵化的僵化的僵化的僵化的僵化的僵化性推。