Signal measurements appearing in the form of time series are one of the most common types of data used in medical machine learning applications. However, such datasets are often small, making the training of deep neural network architectures ineffective. For time-series, the suite of data augmentation tricks we can use to expand the size of the dataset is limited by the need to maintain the basic properties of the signal. Data generated by a Generative Adversarial Network (GAN) can be utilized as another data augmentation tool. RNN-based GANs suffer from the fact that they cannot effectively model long sequences of data points with irregular temporal relations. To tackle these problems, we introduce TTS-GAN, a transformer-based GAN which can successfully generate realistic synthetic time-series data sequences of arbitrary length, similar to the real ones. Both the generator and discriminator networks of the GAN model are built using a pure transformer encoder architecture. We use visualizations and dimensionality reduction techniques to demonstrate the similarity of real and generated time-series data. We also compare the quality of our generated data with the best existing alternative, which is an RNN-based time-series GAN.
翻译:以时间序列形式出现的信号测量是医疗机器学习应用中最常用的数据类型之一,然而,这类数据集往往规模小,使得深神经网络结构的培训无效。对于时间序列,我们可以用来扩大数据集规模的数据增强技巧的套套套由于需要保持信号的基本特性而受到限制。由基因对流网络生成的数据可以用作另一个数据增强工具。基于 RNNG GAN 的数据因无法有效地模拟数据点的长序列与不正常的时间关系而受到影响。为了解决这些问题,我们引入了TTS-GAN,一个基于变压器的GAN,这个变压器能够成功地生成出现实的、具有任意长度的合成时间序列数据序列,类似于真实数据序列。GAN模型的生成器和导引力网络都是用一个纯变压器编码结构构建的。我们使用可视化和分解技术来显示真实和生成的时间序列数据的相似性。我们还将我们生成的数据的质量与现有最佳的替代方法(即基于 RNNAN 的时间序列) 进行比较。