For many machine learning applications, a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform (STFT) which gives complex values. The spectrogram uses the magnitude of these complex values, a commonly used detector. Modern machine learning systems are commonly overparameterized, where possible ill-conditioning problems are ameliorated by regularization. The common use of rectified linear unit (ReLU) activation functions between layers of a deep net has been shown to help this regularization, improving system performance. We extend this idea of ReLU activation to detection for the complex STFT, providing a simple-to-compute modified and regularized spectrogram, which potentially results in better behaved training. We then confirmed the benefit of this approach on a noisy acoustic data set used for a real-world application. Generalization performance improved substantially. This approach might benefit other applications which use time-frequency mappings, for acoustic, audio, and other applications.
翻译:对于许多机器学习应用程序来说,共同输入代表是一个光谱图。光谱图的基本代表是短暂的Fourier变换(STFT),它提供复杂的值。光谱图使用这些复杂值的大小,一个常用的探测器。现代机器学习系统通常使用过量的参数,如果可能存在的不便问题通过正规化得到缓解,则现代机器学习系统通常使用过量的参数。在深网层之间共同使用纠正的线性单元(ReLU)激活功能,有助于这种正规化,改进系统性能。我们把RELU激活的概念推广到对复杂的STFT的探测中,提供简单到配置的经过修改和正规化的光谱谱,这有可能导致更好的行为化培训。我们随后确认了这一方法对于用于现实应用的音响声声数据集的好处。通用性表现大大改进。这一方法可能有益于其他应用,这些应用时频制图,用于音响、音频和其他应用。</s>