There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.
翻译:在线广告业长期需要更新广告创意,即图像和文本,用于吸引在线用户进入品牌。这种更新需要减少在线用户的广告疲劳可能性,并纳入相关产品类别中其他成功运动的见解。如果有品牌,新广告的主题对于创造性战略家来说是一个费时费力的过程。 战略家通常会从过去广告运动中使用的图像和文本以及关于品牌的世界知识中得到灵感。为了通过以往广告活动中的这种多式联运信息来源自动推断广告主题,我们建议一个主题(关键词)建议系统,以减少在线用户的广告疲劳可能性,并纳入相关产品类别中其他成功运动的见解。主题建议基于一个视觉问题回答(VQA)任务集成的结果,该任务包含以下内容:(一) 广告,(二) 与广告有关的文本以及广告中的维基页面,以及(三) 围绕该广告提出的问题。我们利用基于跨模式的变压文本化器来为创造性的策略战略设计师提供主题性平面图。