In the user targeting and expanding of new shows on a video platform, the key point is how their embeddings are generated. It's supposed to be personalized from the perspective of both users and shows. Furthermore, the pursue of both instant (click) and long-time (view time) rewards, and the cold-start problem for new shows bring additional challenges. Such a problem is suitable for processing by heterogeneous graph models, because of the natural graph structure of data. But real-world networks usually have billions of nodes and various types of edges. Few existing methods focus on handling large-scale data and exploiting different types of edges, especially the latter. In this paper, we propose a two-stage audience expansion scheme based on an edge-prompted heterogeneous graph network which can take different double-sided interactions and features into account. In the offline stage, to construct the graph, user IDs and specific side information combinations of the shows are chosen to be the nodes, and click/co-click relations and view time are used to build the edges. Embeddings and clustered user groups are then calculated. When new shows arrive, their embeddings and subsequent matching users can be produced within a consistent space. In the online stage, posterior data including click/view users are employed as seeds to look for similar users. The results on the public datasets and our billion-scale data demonstrate the accuracy and efficiency of our approach.
翻译:在视频平台上,针对新节目的用户定位和拓展中,关键在于如何生成用户和节目的embedding表示,它应该从用户和节目的角度视角进行个性化处理。此外,即时的(点击)和长期的(观看时间)回报的追求,以及新节目的冷启动问题带来了额外的挑战。由于Data的天然图形结构,因此对于此类问题,采用异构图模型进行处理非常合适。然而,实际上的网络通常具有数十亿个节点和各种类型的边缘,极少数现有方法专注于处理大规模数据并利用不同类型的边缘,尤其是后者。在本文中,我们提出了一种基于边缘驱动异构图网络的两阶段观众扩展方案,该方案可以考虑不同的双面交互和特征。在离线阶段,为了构建图形,用户ID和节目的具体侧面信息组合被选择为节点,并且点击/共同点击关系和查看时间被用于构建边缘,计算出嵌入和聚类用户组。当新节目到达时,它们的嵌入和后续匹配用户可以在一致的空间中生成。在在线阶段,使用后验数据包括点击/观看用户作为种子查找相似用户。公共数据集和我们的十亿级数据的结果证明了我们方法的准确性和效率。