Many Click-Through Rate (CTR) prediction works focused on designing advanced architectures to model complex feature interactions but neglected the importance of feature representation learning, e.g., adopting a plain embedding layer for each feature, which results in sub-optimal feature representations and thus inferior CTR prediction performance. For instance, low frequency features, which account for the majority of features in many CTR tasks, are less considered in standard supervised learning settings, leading to sub-optimal feature representations. In this paper, we introduce self-supervised learning to produce high-quality feature representations directly and propose a model-agnostic Contrastive Learning for CTR (CL4CTR) framework consisting of three self-supervised learning signals to regularize the feature representation learning: contrastive loss, feature alignment, and field uniformity. The contrastive module first constructs positive feature pairs by data augmentation and then minimizes the distance between the representations of each positive feature pair by the contrastive loss. The feature alignment constraint forces the representations of features from the same field to be close, and the field uniformity constraint forces the representations of features from different fields to be distant. Extensive experiments verify that CL4CTR achieves the best performance on four datasets and has excellent effectiveness and compatibility with various representative baselines.
翻译:许多点击浏览率(CTR)预测工作的重点是设计先进的结构,以模拟复杂的特征互动,但忽视了特征体现学习的重要性,例如,对每个特征采用简单的嵌入层,从而产生次优的特征表现,从而降低 CTR预测性能;例如,在许多 CTR 任务中,占大多数特征的低频率特征,在标准监督学习环境中考虑较少,导致次优特征表现;在本文件中,我们引入自我监督学习,直接生成高质量特征表现,并提议CTR (CL4CTR) 样式差异表现对比学习框架,包括三个自监督学习信号,以规范特征表现学习:对比性损失、特征调整和实地统一性。对比模块首先通过数据增强来构建正面的特征配对,然后通过对比性损失将每个正对的表达之间的距离最小化。特点调整制约迫使同一领域特征的表述接近,以及外地统一性制约因素迫使不同领域特征的描述达到最远的、具有代表性的测试。