Debiased recommendation with a randomized dataset has shown very promising results in mitigating the system-induced biases. However, it still lacks more theoretical insights or an ideal optimization objective function compared with the other more well studied route without a randomized dataset. To bridge this gap, we study the debiasing problem from a new perspective and propose to directly minimize the upper bound of an ideal objective function, which facilitates a better potential solution to the system-induced biases. Firstly, we formulate a new ideal optimization objective function with a randomized dataset. Secondly, according to the prior constraints that an adopted loss function may satisfy, we derive two different upper bounds of the objective function, i.e., a generalization error bound with the triangle inequality and a generalization error bound with the separability. Thirdly, we show that most existing related methods can be regarded as the insufficient optimization of these two upper bounds. Fourthly, we propose a novel method called debiasing approximate upper bound with a randomized dataset (DUB), which achieves a more sufficient optimization of these upper bounds. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our DUB.
翻译:摘要:通过随机数据集进行去偏推荐在缓解系统引起的偏差方面表现出非常良好的结果。然而,相较于其他已经更加深入研究的路线,它仍然缺乏更多的理论洞见或理想的优化目标函数。为了弥补这个差距,我们从一个新的角度研究去偏问题,并提议直接最小化理想目标函数的上界,这促进了更好的潜在解决方案来应对系统引起的偏差问题。首先,我们提出了一个带随机数据集的新理想优化目标函数。其次,根据一个采用的损失函数可能满足的先验约束,我们推导出两个不同的目标函数上界,即利用三角不等式的泛化误差上界和利用可分离性的泛化误差上界。第三,我们展示了大部分现有相关方法可以被视为这两个上界的不充分优化。第四,我们提出了一种名为带随机数据集的去偏近似上界(DUB)的新方法,它实现了这些上界的更充分优化。最后,我们对公共数据集和真实产品数据集进行了广泛的实验,以验证我们的 DUB 的有效性。