D3Net: 用于音乐源分离的多层连接多层DenseNet (D3Net: Densely connected multidilated DenseNet for music source separation)

Music source separation involves a large input field to model a long-term dependence of an audio signal. Previous convolutional neural network (CNN) -based approaches address the large input field modeling using sequentially down- and up-sampling feature maps or dilated convolution. In this paper, we claim the importance of a rapid growth of a receptive field and a simultaneous modeling of multi-resolution data in a single convolution layer, and propose a novel CNN architecture called densely connected dilated DenseNet (D3Net). D3Net involves a novel multi-dilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multi-dilated convolution with DenseNet architecture, D3Net avoids the aliasing problem that exists when we naively incorporate the dilated convolution in DenseNet. Experimental results on MUSDB18 dataset show that D3Net achieves state-of-the-art performance with an average signal to distortion ratio (SDR) of 6.01 dB.

翻译：音乐源分离涉及一个大型输入字段, 用于模拟音频信号的长期依赖性。先前的进化神经网络( CNN) 以进化神经网络( CNN) 为基础的方法, 处理使用相继下下游和上层取样地貌地图或放大变形的大型输入场建模。在本文中, 我们声称, 快速增长一个可接收字段和同时建模多分辨率数据在单一变化层中的重要性, 并提议一个新型的CNN 结构, 名为“ 密集连接的DenseNet ( D3Net) ” ( D3Net) 。 D3Net 涉及一个新颖的多层演化演化, 在单个层中具有不同的变异系数。通过将多层演化与 DenseNet 结构相结合, D3Net 避免了当我们天真地将DenseNet 的变形变形变形纳入时出现的问题。 MUSDB18 数据集的实验结果显示, D3Net 以6.01 dB 的平均扭曲率信号( SDR) 实现状态性。

相关内容

DenseNet

关注 0

作为CVPR2017年的Best Paper, DenseNet脱离了加深网络层数(ResNet)和加宽网络结构(Inception)来提升网络性能的定式思维,从特征的角度考虑,通过特征重用和旁路(Bypass)设置,既大幅度减少了网络的参数量,又在一定程度上缓解了gradient vanishing问题的产生.结合信息流和特征复用的假设,DenseNet当之无愧成为2017年计算机视觉顶会的年度最佳论文.

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日