An industrial recommender system generally presents a hybrid list that contains results from multiple subsystems. In practice, each subsystem is optimized with its own feedback data to avoid the disturbance among different subsystems. However, we argue that such data usage may lead to sub-optimal online performance because of the \textit{data sparsity}. To alleviate this issue, we propose to extract knowledge from the \textit{super-domain} that contains web-scale and long-time impression data, and further assist the online recommendation task (downstream task). To this end, we propose a novel industrial \textbf{K}nowl\textbf{E}dge \textbf{E}xtraction and \textbf{P}lugging (\textbf{KEEP}) framework, which is a two-stage framework that consists of 1) a supervised pre-training knowledge extraction module on super-domain, and 2) a plug-in network that incorporates the extracted knowledge into the downstream model. This makes it friendly for incremental training of online recommendation. Moreover, we design an efficient empirical approach for KEEP and introduce our hands-on experience during the implementation of KEEP in a large-scale industrial system. Experiments conducted on two real-world datasets demonstrate that KEEP can achieve promising results. It is notable that KEEP has also been deployed on the display advertising system in Alibaba, bringing a lift of $+5.4\%$ CTR and $+4.7\%$ RPM.
翻译:产业推荐人系统通常会提供一个包含多个子系统结果的混合列表。 实际上, 每个子系统都会使用自己的反馈数据优化, 以避免不同子系统之间的干扰。 但是, 我们争辩说, 这些数据的使用可能会因为\ textit{ datasparity} 而导致亚最佳的在线性能。 为了缓解这一问题, 我们提议从\ textit{ data- domain} 中提取包含网络规模和长时印象数据的混合列表, 并进一步协助在线建议任务( 下游任务 ) 。 为此, 我们提议了一个新的工业反馈数据优化, 以避免不同子系统之间的干扰。 但是, 我们设计了一个高效的经验化的在线性能, 因为它是一个双阶段框架, 由1) 包含网络规模和长时期的预培训知识提取模块, 2 将提取的知识纳入下游模式 。 这样, 我们就可以对在线建议进行增量培训。 此外, 我们设计了一个高效的经验化的K- lieval 4 和 KEIEP 系统在大规模的K- 上展示一个有希望的K- lievalal 显示的K- hassalalalal 。