Sense of hearing is crucial for autonomous vehicles (AVs) to better perceive its surrounding environment. Although visual sensors of an AV, such as camera, lidar, and radar, help to see its surrounding environment, an AV cannot see beyond those sensors line of sight. On the other hand, an AV s sense of hearing cannot be obstructed by line of sight. For example, an AV can identify an emergency vehicle s siren through audio classification even though the emergency vehicle is not within the line of sight of the AV. Thus, auditory perception is complementary to the camera, lidar, and radar-based perception systems. This paper presents a deep learning-based robust audio classification framework aiming to achieve improved environmental perception for AVs. The presented framework leverages a deep Convolution Neural Network (CNN) to classify different audio classes. UrbanSound8k, an urban environment dataset, is used to train and test the developed framework. Seven audio classes i.e., air conditioner, car horn, children playing, dog bark, engine idling, gunshot, and siren, are identified from the UrbanSound8k dataset because of their relevancy related to AVs. Our framework can classify different audio classes with 97.82% accuracy. Moreover, the audio classification accuracies with all ten classes are presented, which proves that our framework performed better in the case of AV-related sounds compared to the existing audio classification frameworks.
翻译:听觉感知是自治车辆更好地了解周围环境的关键。虽然AV的视觉传感器,如相机、利达尔和雷达,有助于观察周围环境,但AV无法超越这些传感器的视线。另一方面,AV的听觉感知不能因视线而受阻。例如,AV即使紧急车辆不在AV的视线范围内,也可以通过音频分类确定紧急车辆的警报。因此,听觉感知是对照相机、利达尔和雷达感知系统的辅助。本文展示了一个深层次的基于学习的稳健音频分类框架,目的是改善AV的环境感知。 所介绍的框架利用了深度进化神经网络(CNN)来对不同的音频类别进行分类。 城市Sound8k是一个城市环境数据集,用于培训和测试已开发的框架。 7个音频类,即空调、汽车喇叭、儿童游戏、狗皮、发动机、枪声和红心机等,从城市音频828的音频分类框架中找出一个深的基于学习的稳健的音频8K分类框架,因为我们现有的音频822的分类框架与所有音频分类都有。