This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.
翻译:本文提出MBURST,这是视听话语增强的新多式联运解决方案,它考虑到前额皮层和其他大脑区域金字塔细胞的最新神经神经学发现。所谓的爆发传播以生物上更可信的方式,运用若干标准来解决信用分配问题:通过反馈、反馈和反馈的多重转换和通过不同重量连接的跨层进化信息、近似反馈和反馈连接以及反馈信号线化,引导塑料的标志和规模;MBURST从这种能力中受益,学习噪音信号和视觉刺激信号之间的关联,从而通过扩大相关信息和抑制噪音来赋予演讲意义。在Grid Corpus和CHiME3基础上进行的实验显示,MBURST可以复制类似的面具重建到基于多式反光的基线,同时展示杰出的能源效率管理,将神经发烧速降低到最高为\ textbf{70 ⁇ $}。这种特征意味着更可持续的实施,适合和适宜用于助听力或任何其他类似的嵌入系统。