In recent years, recommender systems have advanced rapidly, where embedding learning for users and items plays a critical role. A standard method learns a unique embedding vector for each user and item. However, such a method has two important limitations in real-world applications: 1) it is hard to learn embeddings that generalize well for users and items with rare interactions on their own; and 2) it may incur unbearably high memory costs when the number of users and items scales up. Existing approaches either can only address one of the limitations or have flawed overall performances. In this paper, we propose Clustered Embedding Learning (CEL) as an integrated solution to these two problems. CEL is a plug-and-play embedding learning framework that can be combined with any differentiable feature interaction model. It is capable of achieving improved performance, especially for cold users and items, with reduced memory cost. CEL enables automatic and dynamic clustering of users and items in a top-down fashion, where clustered entities jointly learn a shared embedding. The accelerated version of CEL has an optimal time complexity, which supports efficient online updates. Theoretically, we prove the identifiability and the existence of a unique optimal number of clusters for CEL in the context of nonnegative matrix factorization. Empirically, we validate the effectiveness of CEL on three public datasets and one business dataset, showing its consistently superior performance against current state-of-the-art methods. In particular, when incorporating CEL into the business model, it brings an improvement of $+0.6\%$ in AUC, which translates into a significant revenue gain; meanwhile, the size of the embedding table gets $2650$ times smaller.
翻译:近些年来,推荐者系统进展迅速,为用户和项目嵌入学习,这给用户和项目带来关键作用。标准方法为每个用户和项目学习一个独特的嵌入矢量。但是,这种方法在现实世界应用程序中有两个重要的限制:(1) 很难学到能够对用户和项目进行广泛概括的嵌入式,用户和项目本身很少互动;(2) 当用户和项目数量扩大时,它可能会带来难以承受的高存储成本。现有方法要么只能解决其中的一个限制,要么总绩效有缺陷。在本文中,我们建议分组嵌入学习(CEL)作为这两个问题的一个综合解决方案。CEL是一个插入式嵌入式嵌入式学习框架,可以与任何不同的功能互动模式结合起来。:(1) 很难学到能够提高性能,特别是冷用户和项目,记忆成本成本降低。CEL能够自动和动态地将用户和项目集中到上下方模式中,集中实体共同学习共同嵌入。CEL的加速版本带来了一个最合适的时间复杂性,这可以支持高效率的在线更新。从理论上,将E值转换为C的高级化成本,我们的一个数据列表中展示了一种最优化的数据。