In recent years, filterbank learning has become an increasingly popular strategy for various audio-related machine learning tasks. This is partly due to its ability to discover task-specific audio characteristics which can be leveraged in downstream processing. It is also a natural extension of the nearly ubiquitous deep learning methods employed to tackle a diverse array of audio applications. In this work, several variations of a frontend filterbank learning module are investigated for piano transcription, a challenging low-level music information retrieval task. We build upon a standard piano transcription model, modifying only the feature extraction stage. The filterbank module is designed such that its complex filters are unconstrained 1D convolutional kernels with long receptive fields. Additional variations employ the Hilbert transform to render the filters intrinsically analytic and apply variational dropout to promote filterbank sparsity. Transcription results are compared across all experiments, and we offer visualization and analysis of the filterbanks.
翻译:近年来,过滤库学习已成为各种音频相关机器学习任务日益流行的战略,部分原因是它能够发现在下游处理过程中可以利用的特定任务音频特性,也是用于处理各种音频应用程序的近乎无处不在的深层学习方法的自然延伸。在这项工作中,对前端过滤库学习模块的若干变式进行了研究,以进行钢琴笔录,这是一项具有挑战性的低级音乐信息检索任务。我们以标准钢琴笔录模型为基础,只修改特征提取阶段。过滤库模块的设计是,其复杂的过滤器不受控制,1D 脉冲内核与长期可接受字段。其他变式则使用希尔伯特转换法,使过滤器在本质上具有厌烦性,并应用变式辍学法来推动过滤库的扩张。所有实验都比较了转换结果,我们提供了过滤库的视觉化和分析。