Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D apartment-models became a benchmark dataset for the community, fostering a range of audiovisual research. In simulation, depth is predictable from sound, by learning bat-like perception with a neural network. Concurrently, the same was achieved in reality by using RGB-D images and echoes of chirping sounds. Biomimicking bat perception is an exciting new direction but needs dedicated datasets to explore the potential. Therefore, we collected the BatVision dataset to provide large-scale echoes in complex real-world scenes to the community. We equipped a robot with a speaker to emit chirps and a binaural microphone to record their echoes. Synchronized RGB-D images from the same perspective provide visual labels of traversed spaces. We sampled modern US office spaces to historic French university grounds, indoor and outdoor with large architectural variety. This dataset will allow research on robot echolocation, general audio-visual tasks and sound phaenomena unavailable in simulated data. We show promising results for audio-only depth prediction and show how state-of-the-art work developed for simulated data can also succeed on our dataset. The data can be downloaded at https://forms.gle/W6xtshMgoXGZDwsE7
翻译:视觉研究显示,通过图像和视频的数据集,在理解我们的世界方面取得了显著的成功。来自雷达、激光雷达、激光雷达和相机的传感器数据支持机器人和自主驱动至少10年的研究。然而,虽然视觉传感器在某些条件下可能失败,但声音最近显示有可能补充感官数据。3D公寓模型中的模拟室冲动反应(RIR)成为社区的基准数据集,促进一系列视听研究。模拟中,深度可以从声音中预测,通过学习神经网络的像蝙蝠一样的感知。同时,通过使用RGB-D图像和感应声音的回声,在现实中也实现了同样的结果。生物模拟蝙蝠感知是一个令人振奋的新方向,但需要专门的数据集来探索这种潜力。因此,我们收集了蝙蝠感知数据集,以便在复杂的现实世界舞台上提供大规模回声。我们装备了一名演讲者,用一个像蝙蝠感知的麦克风来记录他们的回声。Synchronicd RGB-D图像从同一角度提供感官-D图像和感测声音声音声音声音声音声音声音声音声音的回响声音。在法国的滚动和滚动数据空间上,我们采集的现代数据展示的办公室可以展示。</s>