Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.
翻译:嵌入式学习绝对特征(如用户/项目ID)是各种建议模型的核心,包括矩阵因子化和神经合作过滤。标准方法创建嵌入表,每行代表每个独特特性值的专用嵌入矢量。然而,这种方法未能有效处理真实世界建议系统中普遍存在的高心性特征和隐蔽特性值(如新视频ID)。在本文件中,我们提议了另一个嵌入框架Deep Hash嵌入(DHE),用深嵌入网络取代嵌入表,以计算飞上嵌入。DHE首先将特性值编码为具有多个拥有功能和变异功能的独特识别矢量矢量,然后应用 DNN 将识别矢量转换为嵌入。编码模块具有确定性、不可忽略和免费,同时在培训期间更新嵌入网络以学习嵌入生成。Epicalalal 结果表明,DHHEHE在基于标准的完全嵌入式嵌入中取得了可比的AUCUD。使用更小的模型嵌入式嵌入系统,我们的工作将光窗套用于基于标准的DNND型设计。