With the growth of high-dimensional sparse data in web-scale recommender systems, the computational cost to learn high-order feature interaction in CTR prediction task largely increases, which limits the use of high-order interaction models in real industrial applications. Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for accelerating the online model inference. However, they suffer from the degradation of model accuracy in knowledge distillation process. It is challenging to balance the efficiency and effectiveness of the shallow student models. To address this problem, we propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. The proposed lightweight student model DAGFM can learn arbitrary explicit feature interactions from teacher networks, which achieves approximately lossless performance and is proved by a dynamic programming algorithm. Besides, an improved general model KD-DAGFM+ is shown to be effective in distilling both explicit and implicit feature interactions from any complex teacher model. Extensive experiments are conducted on four real-world datasets, including a large-scale industrial dataset from WeChat platform with billions of feature dimensions. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments, showing the superiority of DAGFM to deal with the industrial scale data in CTR prediction task. Our implementation code is available at: https://github.com/RUCAIBox/DAGFM.
翻译:在网络规模的推荐人系统中,随着高层次的分散数据的增长,在CTR预测任务中学习高层次特征互动的计算成本大大增加,这限制了在实际工业应用中使用高层次互动模型。最近一些基于知识的蒸馏方法将知识从复杂的教师模型转移到浅学生模型模型,以加快在线模型推断。然而,由于在知识蒸馏过程中模型准确性下降,它们受到知识蒸馏过程的模型准确性下降的影响。平衡浅级学生模型的效率和有效性具有挑战性。为解决这一问题,我们建议采用一个直接的Abludi Acyal 图形化计算机(KD-DAGFMM),以学习从现有的复杂互动模型中获取高层次特征互动,以了解通过知识蒸馏预测高层次的CTR互动模型。拟议的轻量级学生模型DAGFM可以从教师网络中随意学习明确的特征互动,这种互动可达到大致无损性的工作表现,并通过动态的编程算法演算得到证明。此外,改进的KDD-DFAF系统的任何复杂教师模型的清晰和隐含特性互动。在四个真实世界级的C-OP-DDSD级的高级数据平台上进行了广泛的实验,其中包括21级的大规模的SMA-DFMA-DFMSDSB的大规模SDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSBSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSSSSSSSSDSDSDSSSBBBBBBBBBBSDSBSBSBSBSBBBBBBBBBBDSBDSBDSBBBBBBBDSBDSBFSBBBBBBFSBFSBFS上,我们的大规模的大规模SDSBFSDSDSDSBFSBSBSBSBSBSBSBSBSBS