The distribution of data points is a key component in machine learning. In most cases, one uses min-max normalization to obtain nodes in $[0,1]$ or Z-score normalization for standard normal distributed data. In this paper, we apply transformation ideas in order to design a complete orthonormal system in the $\mathrm{L}_2$ space of functions with the standard normal distribution as integration weight. Subsequently, we are able to apply the explainable ANOVA approximation for this basis and use Z-score transformed data in the method. We demonstrate the applicability of this procedure on the well-known forest fires data set from the UCI machine learning repository. The attribute ranking obtained from the ANOVA approximation provides us with crucial information about which variables in the data set are the most important for the detection of fires.
翻译:数据点的分布是机器学习的一个关键组成部分。 在多数情况下, 使用最小最大正统化来获取标准正常分布数据 $0, 1美元或Z- 分数的节点。 在本文中, 我们应用变异想法来设计一个完整的正正态系统, 以标准正常分布空间作为集成重量。 随后, 我们能够应用可解释的 ANOVA 近似值作为这个基础, 并在方法中使用 Z- 分位转换数据 。 我们用这个程序来显示从 UCI 机器学习库中收集的已知的森林火灾数据集的适用性。 从 ANOVA 近似值中获得的属性排序为我们提供了关键信息, 说明数据集中哪些变量是检测火灾最重要的变量 。