We investigate the design of recommendation systems that can efficiently learn from sparse and delayed feedback. Deep Exploration can play an important role in such contexts, enabling a recommendation system to much more quickly assess a user's needs and personalize service. We design an algorithm based on Thompson Sampling that carries out Deep Exploration. We demonstrate through simulations that the algorithm can substantially amplify the rate of positive feedback relative to common recommendation system designs in a scalable fashion. These results demonstrate promise that we hope will inspire engineering of production recommendation systems that leverage Deep Exploration.
翻译:我们调查建议系统的设计,以便有效地从稀少和延迟的反馈中吸取经验教训。深海探索可以在这种背景下发挥重要作用,使建议系统能够更快地评估用户的需求和个性化服务。我们设计基于Thompson抽样的算法,进行深入探索。我们通过模拟来证明算法能够以可缩放的方式大幅度提高与共同建议系统设计相比的积极反馈率。这些结果显示,我们希望能够激励利用深海探索的生产建议系统工程。