通过模拟视觉内容,促进CTR对新广告的预测 (Boost CTR Prediction for New Advertisements via Modeling Visual Content)

Existing advertisements click-through rate (CTR) prediction models are mainly dependent on behavior ID features, which are learned based on the historical user-ad interactions. Nevertheless, behavior ID features relying on historical user behaviors are not feasible to describe new ads without previous interactions with users. To overcome the limitations of behavior ID features in modeling new ads, we exploit the visual content in ads to boost the performance of CTR prediction models. Specifically, we map each ad into a set of visual IDs based on its visual content. These visual IDs are further used for generating the visual embedding for enhancing CTR prediction models. We formulate the learning of visual IDs into a supervised quantization problem. Due to a lack of class labels for commercial images in advertisements, we exploit image textual descriptions as the supervision to optimize the image extractor for generating effective visual IDs. Meanwhile, since the hard quantization is non-differentiable, we soften the quantization operation to make it support the end-to-end network training. After mapping each image into visual IDs, we learn the embedding for each visual ID based on the historical user-ad interactions accumulated in the past. Since the visual ID embedding depends only on the visual content, it generalizes well to new ads. Meanwhile, the visual ID embedding complements the ad behavior ID embedding. Thus, it can considerably boost the performance of the CTR prediction models previously relying on behavior ID features for both new ads and ads that have accumulated rich user behaviors. After incorporating the visual ID embedding in the CTR prediction model of Baidu online advertising, the average CTR of ads improves by 1.46%, and the total charge increases by 1.10%.

翻译：现有的广告点击率( CTR) 预测模型主要取决于行为识别特征, 其依据是历史用户- 用户互动。然而, 依赖历史用户行为的行为识别特征无法在不与用户进行先前互动的情况下描述新的广告。为了克服新广告模型中行为识别特征的局限性, 我们利用广告中的视觉内容来提升 CTR 预测模型的性能。具体地说, 我们根据视觉内容将每个广告映射成一套视觉识别标志。这些视觉识别标志被进一步用于生成视觉嵌入以强化 CTR 预测模型。我们将视觉识别特征的学习发展成一个受监督的视觉识别特征问题。由于缺少广告中商业图像的类标签, 我们利用图像标识描述作为监管, 优化图像提取器以生成有效的视觉标识ID ID 。同时, 由于硬度是不可区分的, 我们将每个广告的定量操作功能化到支持端到端端端网络培训。在将每张图像绘制成直观标识后, 我们学习了基于历史用户- 身份识别特征的每张图像识别识别识别识别标识标记, 嵌入了以往的直观的直观的视觉行为, 。自上以来, 将直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观的直观演演演演。