For tabular data generated from IIoT devices, traditional machine learning (ML) techniques based on the decision tree algorithm have been employed. However, these methods have limitations in processing tabular data where real number attributes dominate. To address this issue, DeepInsight, REFINED, and IGTD were proposed to convert tabular data into images for utilizing convolutional neural networks (CNNs). They gather similar features in some specific spots of an image to make the converted image look like an actual image. Gathering similar features contrasts with traditional ML techniques for tabular data, which drops some highly correlated attributes to avoid overfitting. Also, previous converting methods fixed the image size, and there are wasted or insufficient pixels according to the number of attributes of tabular data. Therefore, this paper proposes a new converting method, Vortex Feature Positioning (VFP). VFP considers the correlation of features and places similar features far away from each. Features are positioned in the vortex shape from the center of an image, and the number of attributes determines the image size. VFP shows better test performance than traditional ML techniques for tabular data and previous converting methods in five datasets: Iris, Wine, Dry Bean, Epileptic Seizure, and SECOM, which have differences in the number of attributes.
翻译:翻译摘要:
对于从工业物联网设备生成的表格数据,传统的基于决策树算法的机器学习(ML)技术已被采用。然而,这些方法在处理实数属性占主导地位的表格数据时存在限制。为了解决这个问题,提出了DeepInsight、REFINED和IGTD来将表格数据转换为图像,以利用卷积神经网络(CNN)。它们将类似的特征聚集到某些特定的位置,使转换后的图像看起来像实际的图像。类似特征的聚集与传统的表格数据机器学习技术进行了对比,传统技术为避免过拟合,丢弃了一些高度关联的属性。此外,以前的转换方法固定了图像的大小,在许多情况下会有浪费或不足的像素。因此,本文提出了一种全新的转换方法:旋涡特征定位(Vortex Feature Positioning,VFP)。VFP考虑特征之间的相关性,并将类似的特征相互远离。特征从图像的中心处呈旋涡状分布,属性的数量决定图像的大小。在五个数据集(Iris、Wine、Dry Bean、Epileptic Seizure和SECOM)中,VFP表现出比传统的表格数据机器学习技术和以前的转换方法更好的测试性能,这五个数据集区别在于属性的数量。