In Machine Learning, feature selection entails selecting a subset of the available features in a dataset to use for model development. There are many motivations for feature selection, it may result in better models, it may provide insight into the data and it may deliver economies in data gathering or data processing. For these reasons feature selection has received a lot of attention in data analytics research. In this paper we provide an overview of the main methods and present practical examples with Python implementations. While the main focus is on supervised feature selection techniques, we also cover some feature transformation methods.
翻译:在机器学习中,特性选择意味着在数据集中选择可供用于模型开发的现有特征的子集。特征选择有许多动机,它可能导致更好的模型,它可能提供对数据的洞察力,在数据收集或数据处理方面实现节约。由于这些原因,特征选择在数据分析研究中受到极大关注。在本文件中,我们提供了主要方法的概况,并在Python实施过程中提供了实际实例。虽然主要侧重于受监督的特征选择技术,但我们也涵盖了某些特征转换方法。