Despite the recent success of machine learning algorithms, most models face drawbacks when considering more complex tasks requiring interaction between different sources, such as multimodal input data and logical time sequences. On the other hand, the biological brain is highly sharpened in this sense, empowered to automatically manage and integrate such streams of information. In this context, this work draws inspiration from recent discoveries in brain cortical circuits to propose a more biologically plausible self-supervised machine learning approach. This combines multimodal information using intra-layer modulations together with Canonical Correlation Analysis, and a memory mechanism to keep track of temporal data, the overall approach termed Canonical Cortical Graph Neural networks. This is shown to outperform recent state-of-the-art models in terms of clean audio reconstruction and energy efficiency for a benchmark audio-visual speech dataset. The enhanced performance is demonstrated through a reduced and smother neuron firing rate distribution. suggesting that the proposed model is amenable for speech enhancement in future audio-visual hearing aid devices.
翻译:尽管最近机器学习算法取得了成功,但大多数模型在考虑要求不同来源之间互动的复杂任务时都面临缺点,例如多式联运输入数据和逻辑时间序列。另一方面,生物大脑在这种意义上高度精锐,能够自动管理和整合这类信息流。在这方面,这项工作从大脑皮层电路的最近发现中得到灵感,以提出一种更具有生物价值的自我监督机学习方法。这结合了使用内部调节器的多式联运信息,与Canonical相交分析,以及跟踪时间数据的记忆机制,即称为Canonical Cortical图形神经网络的总体方法。这表现为在清洁音频重建方面超越最新最先进的模型和节能模型,用于一个基准的视听语音语音数据集。通过减少和窒息性神经发射速度分布,可以证明提高的性能。这说明拟议的模型可以在未来的视听助听器中加强语音。