Feature embeddings are one of the most essential steps when training deep learning based Click-Through Rate prediction models, which map high-dimensional sparse features to dense embedding vectors. Classic human-crafted embedding size selection methods are shown to be "sub-optimal" in terms of the trade-off between memory usage and model capacity. The trending methods in Neural Architecture Search (NAS) have demonstrated their efficiency to search for embedding sizes. However, most existing NAS-based works suffer from expensive computational costs, the curse of dimensionality of the search space, and the discrepancy between continuous search space and discrete candidate space. Other works that prune embeddings in an unstructured manner fail to reduce the computational costs explicitly. In this paper, to address those limitations, we propose a novel strategy that searches for the optimal mixed-dimension embedding scheme by structurally pruning a super-net via Hard Auxiliary Mask. Our method aims to directly search candidate models in the discrete space using a simple and efficient gradient-based method. Furthermore, we introduce orthogonal regularity on embedding tables to reduce correlations within embedding columns and enhance representation capacity. Extensive experiments demonstrate it can effectively remove redundant embedding dimensions without great performance loss.
翻译:嵌入功能是培训深学习的基于 Click-Trough 比例预测模型的最必要步骤之一,这些模型绘制了密度嵌入矢量的高维稀小特征。 经典人造嵌入尺寸选择方法在记忆使用和模型能力之间的权衡上显示为“ 亚最佳 ” 。 神经结构搜索(NAS) 中的趋势化方法展示了搜索嵌入尺寸的效率。 然而, 以NAS 为基础的大多数现有工程都存在昂贵的计算成本、 搜索空间的维度的诅咒以及连续搜索空间和离散候选空间之间的差异。 以非结构方式嵌入的人类嵌入大小选择方法的其他工作没有明确地降低计算成本。 在本文中,为了解决这些局限性,我们提出了一个新的战略,通过硬性辅助遮罩对超级网络进行结构调整以寻找最佳混合组合嵌入嵌入系统。 我们的方法旨在使用简单高效的梯度方法直接搜索离散空间的候选模型。 此外, 我们引入或可测量的常规性嵌入空间,无法明确减少嵌入式模块中的嵌入能力,以降低大规模嵌入能力。