Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both feature and its interaction. However, most previous works focus on either feature field selection or only select feature interaction based on the fixed feature set to produce the feature set. The former restricts search space to the feature field, which is too coarse to determine subtle features. They also do not filter useless feature interactions, leading to higher computation costs and degraded model performance. The latter identifies useful feature interaction from all available features, resulting in many redundant features in the feature set. In this paper, we propose a novel method named OptFS to address these problems. To unify the selection of feature and its interaction, we decompose the selection of each feature interaction into the selection of two correlated features. Such a decomposition makes the model end-to-end trainable given various feature interaction operations. By adopting feature-level search space, we set a learnable gate to determine whether each feature should be within the feature set. Because of the large-scale search space, we develop a learning-by-continuation training scheme to learn such gates. Hence, OptFS generates the feature set only containing features which improve the final prediction results. Experimentally, we evaluate OptFS on three public datasets, demonstrating OptFS can optimize feature sets which enhance the model performance and further reduce both the storage and computational cost.
翻译:点击通过预测(CTR) 模型将功能转换成潜向矢量,并列出可能的特征互动,以根据输入功能集改进性能。因此,在选择一个最佳功能集时,我们应考虑功能特征及其相互作用的影响。然而,大多数先前的工作侧重于基于固定特征集的功能选择字段或仅选择特征互动,以产生功能集。前者将搜索空间限制在功能字段,因为功能字段过于粗糙,无法确定微妙特征特性。它们也并不过滤无用的特征互动,导致更高的计算成本和退化的模型性能。后者从所有现有特征中找出有用的特征互动,从而在功能集中产生许多冗余的特征。在本文件中,我们提出一个名为 OptFS 的新方法来解决这些问题。为了统一特性选择及其互动,我们将每个特性的选择分解为两个相关特征集。这种分解使模型的端到端训练可因各种特征互动操作而变得粗糙。通过采用特性级位搜索空间,我们设置一个可学习的大门,以确定每个特性是否在设定的特性内,因此,由于大规模搜索空间和功能集中,我们开发了一个用于学习最终的版本。