Automatic speech recognition (ASR) is a capability which enables a program to process human speech into a written form. Recent developments in artificial intelligence (AI) have led to high-accuracy ASR systems based on deep neural networks, such as the recurrent neural network transducer (RNN-T). However, the core components and the performed operations of these approaches depart from the powerful biological counterpart, i.e., the human brain. On the other hand, the current developments in biologically-inspired ASR models, based on spiking neural networks (SNNs), lag behind in terms of accuracy and focus primarily on small scale applications. In this work, we revisit the incorporation of biologically-plausible models into deep learning and we substantially enhance their capabilities, by taking inspiration from the diverse neural and synaptic dynamics found in the brain. In particular, we introduce neural connectivity concepts emulating the axo-somatic and the axo-axonic synapses. Based on this, we propose novel deep learning units with enriched neuro-synaptic dynamics and integrate them into the RNN-T architecture. We demonstrate for the first time, that a biologically realistic implementation of a large-scale ASR model can yield competitive performance levels compared to the existing deep learning models. Specifically, we show that such an implementation bears several advantages, such as a reduced computational cost and a lower latency, which are critical for speech recognition applications.
翻译:人工智能(AI)的最近发展导致基于深神经网络的高度精密的ASR系统,这些系统以深神经网络为基础,例如经常性神经网络传输器(RNN-T)。然而,这些方法的核心组成部分和操作不同于强大的生物对口,即人的大脑。另一方面,生物激发的ASR模型的当前发展动态,以神经内脏网络为基础,在准确性方面落后于后面,主要侧重于小规模应用。在这项工作中,我们重新考虑将生物闪烁型模型纳入深层学习,我们从大脑中发现的各种神经和合成动态中得到灵感,大大增强这些方法的能力。特别是,我们引入神经连通概念,以模拟氧化性模型和氧化性低轴神经突触。基于这一点,我们提出新的深层学习单元,首先以浓缩神经同步动态为基础,主要侧重于小规模应用。我们回顾将生物闪烁式模型纳入深度学习模型,并把它们纳入现实性ANNST的大幅递增成果结构。我们展示了这种深度的学习模型,以降低现有成本和低度模型的形式展示了我们不断演化的成绩。