Watch-time prediction remains to be a key factor in reinforcing user engagement via video recommendations. It has become increasingly important given the ever-growing popularity of online videos. However, prediction of watch time not only depends on the match between the user and the video but is often mislead by the duration of the video itself. With the goal of improving watch time, recommendation is always biased towards videos with long duration. Models trained on this imbalanced data face the risk of bias amplification, which misguides platforms to over-recommend videos with long duration but overlook the underlying user interests. This paper presents the first work to study duration bias in watch-time prediction for video recommendation. We employ a causal graph illuminating that duration is a confounding factor that concurrently affects video exposure and watch-time prediction -- the first effect on video causes the bias issue and should be eliminated, while the second effect on watch time originates from video intrinsic characteristics and should be preserved. To remove the undesired bias but leverage the natural effect, we propose a Duration Deconfounded Quantile-based (D2Q) watch-time prediction framework, which allows for scalability to perform on industry production systems. Through extensive offline evaluation and live experiments, we showcase the effectiveness of this duration-deconfounding framework by significantly outperforming the state-of-the-art baselines. We have fully launched our approach on Kuaishou App, which has substantially improved real-time video consumption due to more accurate watch-time predictions.
翻译:在通过视频建议加强用户参与方面,观察时间预测仍然是关键因素。鉴于在线视频越来越受欢迎程度的不断增加,它变得日益重要。然而,对观察时间的预测不仅取决于用户和视频之间的匹配,而且往往被视频本身的时间长度所误导。为了改进观察时间,建议总是偏向于长期的视频。关于这种不平衡数据的模型面临偏差放大的风险,这种偏差扩大错误地引导平台在较长的时间内超配视频,但却忽视了潜在的用户兴趣。本文介绍了在视频建议时预测中研究时间偏差的首项工作。我们使用因果图表来说明时间长短是一个混杂因素,同时影响视频曝光和观察时间预测本身的时间长度。为了改进视频时间,建议对视频时间的第二个影响是长期的偏差问题,应当消除,同时根据视频的内在特点来保留。要消除不理想的偏差,但要利用自然效果,我们提议一个基于时间的量化基于监视时间的监视时间框架(D2QQ) 来首先研究时间偏差的问题。我们使用一个因果关系图表来说明时间长度是一个混杂的因素,同时预测框架,这同时影响视频曝光时间框架,从而大大地展示了我们所推出的进度的进度,从而大幅地展示了我们所推出的周期的进度框架。