We present a dataset to evaluate localization algorithms, which utilizes vision, audio, and radio sensors: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes RGB images, corresponding depth maps, IMU readings, channel response between a massive MIMO channel sounder and a user equipment, audio recorded by 12 microphones, and 0.5 mm accurate 6DoF pose ground truth. We synchronize these sensors to make sure that all data are recorded simultaneously. A camera, speaker, and transmit antenna are placed on top of a slowly moving service robot and 88 trajectories are recorded. Each trajectory includes 20 to 50 seconds of recorded sensor data and ground truth labels. The data from different sensors can be used separately or jointly to conduct localization tasks and a motion capture system is used to verify the results obtained by the localization algorithms. The main aim of this dataset is to enable research on fusing the most commonly used sensors for localization tasks. However, the full dataset or some parts of it can also be used for other research areas such as channel estimation, image classification, etc. Fusing sensor data can lead to increased localization accuracy and reliability, as well as decreased latency and power consumption. The created dataset will be made public at a later date.
翻译:我们提出了一个数据集来评价本地化算法,该算法使用视觉、音频和无线电传感器:隆德大学愿景、无线电和音频(Luvira)数据集。该数据集包括RGB图像、相应的深度地图、IMO读数、大型MIMO频道声学器和用户设备之间的频道响应、12个麦克风录音,以及0.5毫米准确6DoF的地面真象。我们同步这些传感器以确保所有数据同时记录。一个照相机、扬声器和传输天线放在一个移动缓慢的服务机器人的顶部,记录了88个轨迹。每个轨迹包括20至50秒的已记录的传感器数据和地面真相标签。不同传感器的数据可以单独或联合使用来进行本地化任务,并使用一个运动抓取系统来核查本地化算法获得的结果。该数据集的主要目的是使研究能够将最常用的传感器用于本地化任务。然而,一个完整的数据集或其中的某些部分也可以用于其他研究领域,例如频道估计、图像分类等。在时间上,将数据精确度提高可靠性和精确度,从而降低。