We present a scalable recommender system implementation based on RippleNet, tailored for the media domain with a production deployment in Onet.pl, one of Poland's largest online media platforms. Our solution addresses the cold-start problem for newly published content by integrating content-based item embeddings into the knowledge propagation mechanism of RippleNet, enabling effective scoring of previously unseen items. The system architecture leverages Amazon SageMaker for distributed training and inference, and Apache Airflow for orchestrating data pipelines and model retraining workflows. To ensure high-quality training data, we constructed a comprehensive golden dataset consisting of user and item features and a separate interaction table, all enabling flexible extensions and integration of new signals.
翻译:本文提出了一种基于RippleNet的可扩展推荐系统实现方案,专门针对媒体领域设计,并已在波兰最大的在线媒体平台之一Onet.pl投入生产部署。我们的解决方案通过将基于内容的项目嵌入整合到RippleNet的知识传播机制中,有效解决了新发布内容的冷启动问题,从而能够对先前未见过的项目进行有效评分。该系统架构利用Amazon SageMaker实现分布式训练与推理,并采用Apache Airflow编排数据流水线和模型重训练工作流。为确保高质量的训练数据,我们构建了包含用户与项目特征的综合性黄金数据集及独立的交互表,所有这些设计都支持灵活扩展和新信号的集成。