User historical behaviors are proved useful for Click Through Rate (CTR) prediction in online advertising system. In Meituan, one of the largest e-commerce platform in China, an item is typically displayed with its image and whether a user clicks the item or not is usually influenced by its image, which implies that user's image behaviors are helpful for understanding user's visual preference and improving the accuracy of CTR prediction. Existing user image behavior models typically use a two-stage architecture, which extracts visual embeddings of images through off-the-shelf Convolutional Neural Networks (CNNs) in the first stage, and then jointly trains a CTR model with those visual embeddings and non-visual features. We find that the two-stage architecture is sub-optimal for CTR prediction. Meanwhile, precisely labeled categories in online ad systems contain abundant visual prior information, which can enhance the modeling of user image behaviors. However, off-the-shelf CNNs without category prior may extract category unrelated features, limiting CNN's expression ability. To address the two issues, we propose a hybrid CNN based attention module, unifying user's image behaviors and category prior, for CTR prediction. Our approach achieves significant improvements in both online and offline experiments on a billion scale real serving dataset.
翻译:用户的历史行为被证明对在线广告系统中的“ 点击率” (CTR) 预测有用。 在中国最大的电子商务平台 — — 中国最大的电子商务平台之一 — — Meituan 中,一个项目通常以其图像显示,用户是否点击该项目通常受其图像的影响,这意味着用户的图像行为有助于理解用户的视觉偏好并提高 CTR 预测的准确性。 现有的用户图像行为模型通常使用一个两阶段结构,通过现成的 Convolution Neal网络(CNNs)在第一阶段提取图像的视觉嵌入,然后用这些视觉嵌入和非视觉功能联合培训CTR模型。我们发现,两阶段结构是CTR预测的亚优性。 同时,在线广告系统中的精确标签类别包含丰富的视觉先前信息,可以加强用户图像行为的模型。然而,没有分类的现成CNN可以提取与现成的不相干的特点,限制CNN的表达能力。为了解决这两个问题,我们提议在基于网络的网络上的网络关注、统一用户图像的升级模型上实现我们10亿项的重要的在线预测。