The challenge of solving data mining problems in e-commerce applications such as recommendation system (RS) and click-through rate (CTR) prediction is how to make inferences by constructing combinatorial features from a large number of categorical features while preserving the interpretability of the method. In this paper, we propose Automatic Embedded Feature Engineering(AEFE), an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection. By selecting the potential field pairs intelligently and generating a series of interpretable combinatorial features, our framework can provide a set of unseen generated features for enhancing model performance and then assist data analysts in discovering the feature importance for particular data mining tasks. Furthermore, AEFE is distributed implemented by task-parallelism, data sampling, and searching schema based on Matrix Factorization field combination, to optimize the performance and enhance the efficiency and scalability of the framework. Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.
翻译:解决电子商务应用中的数据开采问题,例如建议系统(RS)和点击通速率预测(CTR)的难题是如何通过从大量绝对特征中建立组合特征来作出推论,同时保留该方法的可解释性;在本文件中,我们提议一个自动嵌入式地物工程(AEFE),这是一个代表绝对特征的自动地物工程框架,由各种组成部分组成,包括定制范式的构建和多重特征选择;通过明智地选择潜在的字段配对,并产生一系列可解释的组合特征,我们的框架可以提供一套无形生成的特征,用于提高模型性能,然后协助数据分析人员发现特定数据开采任务的特点的重要性;此外,AEFE通过任务共性、数据取样和基于矩阵集成场组合的搜索模型进行分配,以优化性能,提高框架的效率和可缩放度;对一些典型的电子商务数据集进行的实验表明,我们的方法优于典型的机器学习模型和最先进的深层学习模型。