Machine Learning models are used in a wide variety of domains. However, machine learning methods often require a large amount of data in order to be successful. This is especially troublesome in domains where collecting real-world data is difficult and/or expensive. Data simulators do exist for many of these domains, but they do not sufficiently reflect the real world data due to factors such as a lack of real-world noise. Recently generative adversarial networks (GANs) have been modified to refine simulated image data into data that better fits the real world distribution, using the SimGAN method. While evolutionary computing has been used for GAN evolution, there are currently no frameworks that can evolve a SimGAN. In this paper we (1) extend the SimGAN method to refine one-dimensional data, (2) modify Easy Cartesian Genetic Programming (ezCGP), an evolutionary computing framework, to create SimGANs that more accurately refine simulated data, and (3) create new feature-based quantitative metrics to evaluate refined data. We also use our framework to augment an electrocardiogram (ECG) dataset, a domain that suffers from the issues previously mentioned. In particular, while healthy ECGs can be simulated there are no current simulators of abnormal ECGs. We show that by using an evolved SimGAN to refine simulated healthy ECG data to mimic real-world abnormal ECGs, we can improve the accuracy of abnormal ECG classifiers.
翻译:然而,机器学习方法往往需要大量数据才能取得成功。在收集真实世界数据十分困难和/或昂贵的领域,这尤其困难。数据模拟器确实存在于许多这些领域,但由于缺少现实世界噪音等因素,它们没有充分反映真实世界数据。最近对立网络(GANs)进行了基因化改造,将模拟图像数据改进为更适合真实世界分布的数据,使用SimGAN方法。虽然在GAN演进过程中使用了进化计算方法,但目前没有能够演进一个SimGAN的框架。在本文中,我们(1) 扩展SimGAN方法以完善一维数据,(2) 修改卡提亚基因基因的简单程序(ezGP),这个进化计算框架,以创建能更准确地改进模拟数据的SimGANs,以及(3) 创建新的基于特征的定量指标来评价精炼的数据。我们还利用我们的框架来增强电心图(ECG),这是一个有真实问题的域。我们(1) 扩展SimGAN方法来完善当前问题的域。我们用SimGGSG来模拟一个健康的SIMG数据。我们无法向真实的SimG展示。