With the expansion of AI-powered virtual assistants, there is a need for low-power keyword spotting systems providing a "wake-up" mechanism for subsequent computationally expensive speech recognition. One promising approach is the use of neuromorphic sensors and spiking neural networks (SNNs) implemented in neuromorphic processors for sparse event-driven sensing. However, this requires resource-efficient SNN mechanisms for temporal encoding, which need to consider that these systems process information in a streaming manner, with physical time being an intrinsic property of their operation. In this work, two candidate neurocomputational elements for temporal encoding and feature extraction in SNNs described in recent literature - the spiking time-difference encoder (TDE) and disynaptic excitatory-inhibitory (E-I) elements - are comparatively investigated in a keyword-spotting task on formants computed from spoken digits in the TIDIGITS dataset. While both encoders improve performance over direct classification of the formant features in the training data, enabling a complete binary classification with a logistic regression model, they show no clear improvements on the test set. Resource-efficient keyword spotting applications may benefit from the use of these encoders, but further work on methods for learning the time constants and weights is required to investigate their full potential.
翻译:随着AI授权的虚拟助手的扩大,需要低功率关键字识别系统,为随后计算昂贵的语音识别提供“觉醒”机制,以便随后进行昂贵的语音识别。一种有希望的方法是使用神经形态处理器中为稀有事件感测而实施的神经形态处理器和神经神经神经神经神经网络(SNNs),然而,这需要资源效率高的SNN机制的时间编码机制,这种机制需要考虑这些系统的信息以流方式处理,而物理时间是其操作的固有属性。在这项工作中,最近文献中描述的两个候选神经元转换元素为时间编码和特征提取的“觉醒醒”机制,即神经形态传感器和神经神经神经神经神经网络(SNNNS)的使用是神经形态处理器,用于稀释事件驱动的神经形态处理器,而神经神经元识别元识别器(SNNNN)则需要以资源效率高的方法进行相对的调查。这两个编码都提高了培训数据中成型特征的直接分类的性能,使得能够用一个完整的二元化的精确的回归模型进行彻底分类,但是它们并没有利用这些不断的临界标准。