Datasets for training recommender systems are often subject to distribution shift induced by users' and recommenders' selection biases. In this paper, we study the impact of selection bias on datasets with different quantization. We then leverage two differently quantized datasets from different source distributions to mitigate distribution shift by applying the inverse probability scoring method from causal inference. Empirically, our approach gains significant performance improvement over single-dataset methods and alternative ways of combining two datasets.
翻译:培训推荐人系统的数据集往往会因用户和建议人的选择偏差而发生分布转移。 在本文中,我们研究了选择偏差对具有不同量化的数据集的影响。然后我们从不同来源的分布中利用两个不同的量化数据集,通过应用因果推理的反概率评分方法来减缓分布变化。 生动地说,我们的方法在单一数据集方法和将两个数据集合并的替代方法上取得了显著的绩效改进。