Traditional recommender systems aim to estimate a user's rating to an item based on observed ratings from the population. As with all observational studies, hidden confounders, which are factors that affect both item exposures and user ratings, lead to a systematic bias in the estimation. Consequently, a new trend in recommender system research is to negate the influence of confounders from a causal perspective. Observing that confounders in recommendations are usually shared among items and are therefore multi-cause confounders, we model the recommendation as a multi-cause multi-outcome (MCMO) inference problem. Specifically, to remedy confounding bias, we estimate user-specific latent variables that render the item exposures independent Bernoulli trials. The generative distribution is parameterized by a DNN with factorized logistic likelihood and the intractable posteriors are estimated by variational inference. Controlling these factors as substitute confounders, under mild assumptions, can eliminate the bias incurred by multi-cause confounders. Furthermore, we show that MCMO modeling may lead to high variance due to scarce observations associated with the high-dimensional causal space. Fortunately, we theoretically demonstrate that introducing user features as pre-treatment variables can substantially improve sample efficiency and alleviate overfitting. Empirical studies on simulated and real-world datasets show that the proposed deep causal recommender shows more robustness to unobserved confounders than state-of-the-art causal recommenders. Codes and datasets are released at https://github.com/yaochenzhu/deep-deconf.
翻译:传统的推荐人系统旨在根据观察到的人群评级来估计用户对某个项目的评级。 与所有观察研究一样, 隐藏的混淆者(这些是影响项目曝光量和用户评级的因素)导致在估算中出现系统性偏差。 因此, 推荐人系统研究的新趋势是从因果角度否定混淆者的影响。 发现建议中的混淆者通常在项目中共享,因此是多原因的混淆者,我们将建议建模为多原因的多原因多重结果(MCMO)推断问题。 具体来说,为了纠正纠结的偏差,我们估计用户特有的潜伏变量,使项目曝光量具有独立性的Bernoulli试验。 基因化分布由带有因素化的后勤可能性的DNN(DN)进行参数化,而棘手的后背人则通过变推法的推论来估计。 控制这些因素作为替代理解者通常在项目中共享,因此是多原因的混淆者造成的偏差。 此外, 我们显示, MCMO建模可能会导致高度的差异, 原因是与高层次的因果关系空间相关的观测很少。 幸运的是,我们从理论上展示了真实的样本分析数据, 。 我们从理论上显示, 模拟分析中可以显示, 将数据显示, 正确的分析数据比真实性分析方法可以大大地显示, 。