Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by \textit{stages}. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets.
翻译:在设计神经网络结构方面最近取得的进展,特别是那些专门进行模拟的神经网络结构,大大改进了语音分离性能。在这项工作中,我们提议使用一个生物启发型结构,称为全常电动神经网络(FRCNN),以解决分离任务。该模型包含自下而上的、自上而下和横向连接,与由\textit{steps}代表的不同时间尺度处理的信息集成。与传统方法同步更新阶段相比,我们提议首先在自下而上的方向上逐个更新阶段,然后将相邻阶段的信息同时并存,最后将所有阶段的信息融合到底层阶段。实验表明,这一不同步的更新计划取得了比传统的同步更新计划要少得多的结果。此外,拟议的模型在语音分离准确性和计算效率与三个基准数据集的其他最先进的模型之间实现了良好的平衡。