As one of the leading platforms for creative content, Tumblr offers advertisers a unique way of creating brand identity. Advertisers can tell their story through images, animation, text, music, video, and more, and promote that content by sponsoring it to appear as an advertisement in the streams of Tumblr users. In this paper we present a framework that enabled one of the key targeted advertising components for Tumblr, specifically gender and interest targeting. We describe the main challenges involved in development of the framework, which include creating the ground truth for training gender prediction models, as well as mapping Tumblr content to an interest taxonomy. For purposes of inferring user interests we propose a novel semi-supervised neural language model for categorization of Tumblr content (i.e., post tags and post keywords). The model was trained on a large-scale data set consisting of 6.8 billion user posts, with very limited amount of categorized keywords, and was shown to have superior performance over the bag-of-words model. We successfully deployed gender and interest targeting capability in Yahoo production systems, delivering inference for users that cover more than 90% of daily activities at Tumblr. Online performance results indicate advantages of the proposed approach, where we observed 20% lift in user engagement with sponsored posts as compared to untargeted campaigns.
翻译:作为创作内容的主要平台之一,Tumblr为广告商提供了创建品牌身份的独特途径。广告商可以通过图像、动画、文字、音乐、视频等更多来讲述自己的故事,并通过赞助在Tumblr用户流中作为广告来宣传该内容。在本文中,我们提出了一个框架,使Tumblr能够为Tumblr提供一个关键的目标广告内容之一,特别是性别和兴趣选择。我们描述了框架制定过程中涉及的主要挑战,其中包括为培训性别预测模型创造地面真相,以及将 Tumblr 内容映射为兴趣分类。为了推断用户的兴趣,我们提议了一个新的半超导神经语言模型,用于对 Tumblr 内容进行分类(例如,邮箱标签和邮箱关键字) 。模型在一个由68亿用户站组成的大型数据集上进行了培训,其分类关键词数量非常有限,并展示出比字包模型模型更好的业绩。我们成功地在Yahooo生产系统中部署了性别问题和兴趣定位能力。为了推断用户的兴趣,我们提出了一个新的半超级神经语言语言语言语言语言模型模型,我们提出了一种新的半超级神经语言模型,用来比较了20个用户的在线工具,用来显示我们所观察到的在线用户的在线操作的优势。