Despite the recent success of machine learning algorithms, most of these models still face several drawbacks when considering more complex tasks requiring interaction between different sources, such as multimodal input data and logical time sequence. On the other hand, the biological brain is highly sharpened in this sense, empowered to automatically manage and integrate such a stream of information through millions of years of evolution. In this context, this paper finds inspiration from recent discoveries on cortical circuits in the brain to propose a more biologically plausible self-supervised machine learning approach that combines multimodal information using intra-layer modulations together with canonical correlation analysis (CCA), as well as a memory mechanism to keep track of temporal data, the so-called Canonical Cortical Graph Neural networks. The approach outperformed recent state-of-the-art results considering both better clean audio reconstruction and energy efficiency, described by a reduced and smother neuron firing rate distribution, suggesting the model as a suitable approach for speech enhancement in future audio-visual hearing aid devices.
翻译:尽管机器学习算法最近取得了成功,但大多数这些模型在考虑要求不同来源之间互动的复杂任务时仍面临若干缺点,例如多式联运输入数据和逻辑时间序列。另一方面,生物大脑在这种意义上高度精锐,能够自动管理和整合这种信息流,经历了数以百万计的演进年。在这方面,本文件从最近对大脑皮层电路的发现中找到灵感,可以提出一种在生物上更可信的自我监督的机器学习方法,将利用内部调制的多式信息与卡通式相关分析相结合,以及一个跟踪时间数据的记忆机制,即所谓的卡诺尼科科科科立形神经网络。这一方法超越了最新的最新成果,既考虑到更好的清洁音频重建,又考虑到能源效率,通过减少和抑制神经神经发火率的传播,提出这一模式是未来视听助听器增强语音的合适方法。