Feature interaction has been recognized as an important problem in machine learning, which is also very essential for click-through rate (CTR) prediction tasks. In recent years, Deep Neural Networks (DNNs) can automatically learn implicit nonlinear interactions from original sparse features, and therefore have been widely used in industrial CTR prediction tasks. However, the implicit feature interactions learned in DNNs cannot fully retain the complete representation capacity of the original and empirical feature interactions (e.g., cartesian product) without loss. For example, a simple attempt to learn the combination of feature A and feature B <A, B> as the explicit cartesian product representation of new features can outperform previous implicit feature interaction models including factorization machine (FM)-based models and their variations. In this paper, we propose a Co-Action Network (CAN) to approximate the explicit pairwise feature interactions without introducing too many additional parameters. More specifically, giving feature A and its associated feature B, their feature interaction is modeled by learning two sets of parameters: 1) the embedding of feature A, and 2) a Multi-Layer Perceptron (MLP) to represent feature B. The approximated feature interaction can be obtained by passing the embedding of feature A through the MLP network of feature B. We refer to such pairwise feature interaction as feature co-action, and such a Co-Action Network unit can provide a very powerful capacity to fitting complex feature interactions. Experimental results on public and industrial datasets show that CAN outperforms state-of-the-art CTR models and the cartesian product method. Moreover, CAN has been deployed in the display advertisement system in Alibaba, obtaining 12\% improvement on CTR and 8\% on Revenue Per Mille (RPM), which is a great improvement to the business.
翻译:在机器学习中,人们认识到,机体学习中的隐含特征互动是一个重要的问题,这对于点击通速(CTR)预测任务也非常重要。近年来,深神经网络(DNNS)可以自动从原始的稀少特性中学习隐含的非线性互动,因此在工业CTR预测任务中广泛使用。然而,在DNNS中学习的隐含特征互动不能完全保留原始和经验特征互动(如cartesian 产物)的完整代表能力而不亏损。例如,简单尝试学习功能A和特征B<A,B>的组合,作为新功能的直观碳酸产品代表了以前的隐含特征互动模型,包括基于因子化(FM)的模型及其变异。在本文中,我们提议建立一个共同行动网络,在不引入太多额外参数的情况下,可以完全保留原样A及其相关特性B,通过学习两套参数来模拟其特征互动。在功能A上嵌入,而在功能A,和B级服务器上显示一个多功能的自动显示系统(MLP),通过B级数据库显示这种特性的特性,通过B级数据显示一个功能。