Sequential recommender systems (SRS) have become a research hotspot due to its power in modeling user dynamic interests and sequential behavioral patterns. To maximize model expressive ability, a default choice is to apply a larger and deeper network architecture, which, however, often brings high network latency when generating online recommendations. Naturally, we argue that compressing the heavy recommendation models into middle- or light- weight neural networks is of great importance for practical production systems. To realize such a goal, we propose AdaRec, a knowledge distillation (KD) framework which compresses knowledge of a teacher model into a student model adaptively according to its recommendation scene by using differentiable Neural Architecture Search (NAS). Specifically, we introduce a target-oriented distillation loss to guide the structure search process for finding the student network architecture, and a cost-sensitive loss as constraints for model size, which achieves a superior trade-off between recommendation effectiveness and efficiency. In addition, we leverage Earth Mover's Distance (EMD) to realize many-to-many layer mapping during knowledge distillation, which enables each intermediate student layer to learn from other intermediate teacher layers adaptively. Extensive experiments on real-world recommendation datasets demonstrate that our model achieves competitive or better accuracy with notable inference speedup comparing to strong counterparts, while discovering diverse neural architectures for sequential recommender models under different recommendation scenes.
翻译:序列推荐系统(SRS)因其在模拟用户动态利益和相继行为模式方面的力量而成为研究热点。为了最大限度地发挥模型表现能力,默认选择是应用一个更大和更深的网络结构,但这种结构在产生在线建议时往往带来高网络悬浮。自然,我们争辩说,将重建议模型压缩到中等或轻重神经网络对于实际生产系统非常重要。为了实现这样一个目标,我们提议AdaRec,一个知识蒸馏框架,将教师模型的知识压缩成一个学生模型,通过使用不同的神经结构搜索(NAS)来使其建议场景适应。具体地说,我们引入一个面向目标的蒸馏损失,以指导寻找学生网络结构的结构搜索进程,以及将成本敏感的损失作为模型规模的制约,从而在建议效果和效率之间实现更高程度的权衡。此外,我们利用地球移动多样性建议距离(EMD)框架,在知识蒸馏过程中根据建议场景将教师模型进行适应性适应性调整,使每个中层的师型对比能够从中进行更精确性测试,从中间层到从教师的深层次上学习其他的深层次,从而进行更精确地测试。