This paper presents a sensory fusion neuromorphic dataset collected with precise temporal synchronization using a set of Address-Event-Representation sensors and tools. The target application is the lip reading of several keywords for different machine learning applications, such as digits, robotic commands, and auxiliary rich phonetic short words. The dataset is enlarged with a spiking version of an audio-visual lip reading dataset collected with frame-based cameras. LIPSFUS is publicly available and it has been validated with a deep learning architecture for audio and visual classification. It is intended for sensory fusion architectures based on both artificial and spiking neural network algorithms.
翻译:本文介绍了一种使用一组地址事件表示传感器和工具收集的感知融合神经形态数据集。目标应用是不同机器学习应用程序(如数字、机器人命令和辅助的富含语音的短单词)的几个关键字的口型识别。该数据集还包含用基于帧的相机收集的视听读唇数据集的脉冲版本。LIPSFUS可以通过深度学习架构用于音频和视觉分类,并具有公共数据集的属性。它旨在基于人工和脉冲神经网络算法的感知融合体系结构上进行验证。