Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.
翻译:许多当前应用程序都使用建议来改变自然用户行为,例如增加销售数量或网站花费的时间。这导致最终建议目标与传统设置之间出现差距,在传统设置中,通过预测用户项目矩阵中缺失的条目或最有可能的下一个事件,根据与过去用户行为的一致性,对推荐候选人进行评价,预测用户项目矩阵中的缺失条目,或预测最可能发生的下一个事件。为了缩小这一差距,我们优化了增加预期结果相对于有机用户行为的建议政策。我们表明,这相当于学习根据完全随机的建议政策预测建议结果。为此,我们提议了一种新的域适应算法,从含有偏向建议政策结果的登录数据中学习,并根据随机暴露预测建议结果。我们除了采用新的因果建议方法外,还比较了我们的方法与最先进的因数化方法,并显示出显著的改进。