Players in the online ad ecosystem are struggling to acquire the user data required for precise targeting. Audience look-alike modeling has the potential to alleviate this issue, but models' performance strongly depends on quantity and quality of available data. In order to maximize the predictive performance of our look-alike modeling algorithms, we propose two novel hybrid filtering techniques that utilize the recent neural probabilistic language model algorithm doc2vec. We apply these methods to data from a large mobile ad exchange and additional app metadata acquired from the Apple App store and Google Play store. First, we model mobile app users through their app usage histories and app descriptions (user2vec). Second, we introduce context awareness to that model by incorporating additional user and app-related metadata in model training (context2vec). Our findings are threefold: (1) the quality of recommendations provided by user2vec is notably higher than current state-of-the-art techniques. (2) User representations generated through hybrid filtering using doc2vec prove to be highly valuable features in supervised machine learning models for look-alike modeling. This represents the first application of hybrid filtering user models using neural probabilistic language models, specifically doc2vec, in look-alike modeling. (3) Incorporating context metadata in the doc2vec model training process to introduce context awareness has positive effects on performance and is superior to directly including the data as features in the downstream supervised models.
翻译:在线广告生态系统中的玩家正在奋力获取精确定位所需的用户数据。 类似观光模型的模型有可能缓解这一问题, 但模型的性能在很大程度上取决于可用数据的数量和质量。 为了最大限度地提高我们外观模型算法的预测性能, 我们提议了两种新型混合过滤技术, 利用最新的神经振荡性语言模型算法 doc2vec 。 我们将这些方法应用于大型移动广告交换的数据和从苹果软件商店和谷歌游戏商店获取的额外应用程序元数据。 首先, 我们通过应用程序使用历史和应用程序描述(用户2vec)来模拟移动应用程序的用户。 其次, 我们通过在模型培训( comtext2vec) 中纳入额外的用户和与应用程序有关的元数据,来引入该模型的背景意识。 我们的发现有三重:(1) 用户2 模型所提供的建议的质量明显高于当前的最新技术。 (2) 通过使用 doc2vec 模型生成的混合过滤模型产生的用户代表了监督机器学习模型中非常有价值的功能。 这代表了在像样模型背景中应用更高级的升级数据模型, 包括直接引入数据模型。