Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user's demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user's demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5\% for gender.
翻译:Tumblr作为一个主要内容提供者和社交媒体,每月吸引了3.71亿次访问、2.8亿个博客和5 330万个每日文章。Tumblr的受欢迎度为广告商通过赞助职位宣传其产品提供了极好的机会。然而,这是一个针对广告特定人口群体的艰巨任务,因为Tumblr在注册时并不要求性别和年龄等用户信息。因此,为了促进目标选择,必须利用文章、图像和社会联系等内容丰富的内容来预测用户的人口统计。在本文中,我们提出基于图表和深层学习的年龄和性别预测模型,其中考虑到用户的活动和内容特点。在基于图表的模型中,我们提出了两种方法,即网络嵌入和标签传播,以产生连接特征,以及直接推导出用户的表象学。在深层次学习模型中,我们利用进化神经网络和多层渗透器(MLP)来预测用户的年龄和性别。在真实的Tumbl日数据集上实验结果,有数亿个活跃的用户和数十亿个后续关系特点。在图表模型中,我们提出了两种方法,通过81年的精确度的精确度,通过比较精确度来改进了我们的基线。