Existing datasets used to train deep learning models for narrowband radio frequency (RF) signal classification lack enough diversity in signal types and channel impairments to sufficiently assess model performance in the real world. We introduce the Sig53 dataset consisting of 5 million synthetically-generated samples from 53 different signal classes and expertly chosen impairments. We also introduce TorchSig, a signals processing machine learning toolkit that can be used to generate this dataset. TorchSig incorporates data handling principles that are common to the vision domain, and it is meant to serve as an open-source foundation for future signals machine learning research. Initial experiments using the Sig53 dataset are conducted using state of the art (SoTA) convolutional neural networks (ConvNets) and Transformers. These experiments reveal Transformers outperform ConvNets without the need for additional regularization or a ConvNet teacher, which is contrary to results from the vision domain. Additional experiments demonstrate that TorchSig's domain-specific data augmentations facilitate model training, which ultimately benefits model performance. Finally, TorchSig supports on-the-fly synthetic data creation at training time, thus enabling massive scale training sessions with virtually unlimited datasets.
翻译:用于培训窄带无线电频率信号分类深层学习模型的现有数据集,在信号类型和频道缺陷方面缺乏足够的多样性,无法充分评估真实世界的模型性能。我们引入了Sig53数据集,由53个不同信号类别和专家选择的缺陷的500万个合成样本组成。我们还引入了TorchSIG,这是一个可用于生成该数据集的信号处理机学习工具工具包。TorchSig纳入了与视觉领域共同的数据处理原则,目的是作为未来信号机器学习研究的开源基础。利用Sig53数据集进行的初步实验是利用艺术(SoTA)革命神经网络(Convalnets)和变异器进行。这些实验显示变异体超过ConvNet,而不需要额外的正规化或ConvNet教师,这与视觉领域的结果相悖。其他实验表明,TrchSig的域特定数据增强有助于模式培训,最终有利于模型的性能。最后,TrchSig支持在培训时间在飞行合成数据创建上进行初步的数据。