Traditional content-based tag recommender systems directly learn the association between user-generated content (UGC) and tags based on collected UGC-tag pairs. However, since a UGC uploader simultaneously creates the UGC and selects the corresponding tags, her personal preference inevitably biases the tag selections, which prevents these recommenders from learning the causal influence of UGCs' content features on tags. In this paper, we propose a deep deconfounded content-based tag recommender system, namely, DecTag, to address the above issues. We first establish a causal graph to represent the relations among uploader, UGC, and tag, where the uploaders are identified as confounders that spuriously correlate UGC and tag selections. Specifically, to eliminate the confounding bias, causal intervention is conducted on the UGC node in the graph via backdoor adjustment, where uploaders' influence on tags leaked through backdoor paths can be eliminated for causal effect estimation. Observing that adjusting the causal graph with do-calculus requires integrating the entire uploader space, which is infeasible, we design a novel Monte Carlo (MC)-based estimator with bootstrap, which can achieve asymptotic unbiasedness provided that uploaders for the collected UGCs are i.i.d. samples from the population. In addition, the MC estimator has the intuition of substituting the biased uploaders with a hypothetical random uploader from the population in the training phase, where deconfounding can be achieved in an interpretable manner. Finally, we establish a YT-8M-Causal dataset based on the widely used YouTube-8M dataset with causal intervention and propose an evaluation strategy accordingly to unbiasedly evaluate causal tag recommenders. Extensive experiments show that DecTag is more robust to confounding bias than state-of-the-art causal recommenders.
翻译:传统的基于内容的标签建议系统直接学习用户生成的内容( UGC) 和基于所收集的 UGC 标签的标签之间的关联。 但是, 由于 UGC 上传者同时创建 UGC 并选择相应的标签, 她的个人偏好不可避免地会偏向标签选择, 这使得这些推荐者无法了解 UGC 内容特性对标签的因果关系。 在本文中, 我们提议一个深度的、 深度的、 无根据的基于内容的标签建议系统, 即 DecTag, 以解决上述问题 。 我们首先建立一个因果图表, 以代表上传者、 UGC 和标签之间的关系。 上传者被确定为虚假的与 UGC 相关联并选择相应的标签。 具体地说, 为了消除纠结的偏差, 上传者通过后门调整对通过后门路径渗漏的标签的影响, 可以消除因果效应估计。 通过 do- calus 调整因果的图表需要整合整个上传空间, 这是无法做到的, 我们设计了一个不精确的上传的 Oiral- dembal 战略, 我们设计了一个新的Creal ladeal ladeal laction ladeal real laction laction laction ladeal ladeal lady laveal laction 。 lax the laut the laut the lax the lax the lax the laut the latical ladal lautdal ladal latial ladal modal modal modal mod dgodal modal ladal modal modal ladaldaldaldaldaldaldaldaldaldaldaldal ladaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldal ladaldaldaldal modal modal modal modal modaldaldaldaldaldaldaldal ladal ladaldaldal