Feature selection has been an essential step in developing industry-scale deep Click-Through Rate (CTR) prediction systems. The goal of neural feature selection (NFS) is to choose a relatively small subset of features with the best explanatory power as a means to remove redundant features and reduce computational cost. Inspired by gradient-based neural architecture search (NAS) and network pruning methods, people have tackled the NFS problem with Gating approach that inserts a set of differentiable binary gates to drop less informative features. The binary gates are optimized along with the network parameters in an efficient end-to-end manner. In this paper, we analyze the gradient-based solution from an exploration-exploitation perspective and use empirical results to show that Gating approach might suffer from insufficient exploration. To improve the exploration capacity of gradient-based solutions, we propose a simple but effective ensemble learning approach, named Ensemble Gating. We choose two public datasets, namely Avazu and Criteo, to evaluate this approach. Our experiments show that, without adding any computational overhead or introducing any hyper-parameter (except the size of the ensemble), our method is able to consistently improve Gating approach and find a better subset of features on the two datasets with three different underlying deep CTR prediction models.
翻译:在开发工业规模深度点击率(CTR)预测系统的过程中,选择地貌特征一直是开发行业深度点击率(CTR)预测系统的一个必要步骤。神经特征选择的目标是选择相对小的一组具有最佳解释力的特征,作为消除冗余特征和降低计算成本的手段。在基于梯度的神经结构搜索(NAS)和网络修剪方法的启发下,人们已经用定位方法解决NFS问题,即插入一套不同的双进制门,以降低信息量。二进制门与网络参数一道,以高效的端到端方式优化。在本文件中,我们从探索-开发角度分析梯度基解决方案,并使用实验结果来表明Gateg方法可能因探索不足而受到损害。为了提高基于梯度的解决方案的探索能力,我们提出了简单而有效的共通学习方法,名为“Entsemble Getable Gatet。我们选择了两个公共数据集,即Avazu和Criteo来评估这一方法。我们的实验显示,在不增加任何计算间接间接或引入任何超位模型的情况下,我们在不增加任何高分数或引入任何超分数方法的情况下,我们最能的C和最能的C将找到一种方法。