Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical. While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics. Our project website is at: boombox.cs.columbia.edu
翻译:与垃圾箱和容器的相互作用是机器人的一项基本任务,它使得对垃圾箱内物体的状态估计变得至关重要。虽然机器人经常使用相机进行国家估计,但是由于隔离和低光度,视觉模式并不总是理想的。我们引入了“Bombox ”这个容器,这个容器使用声音来估计盒子内内容的状态。根据关于物体及其容器之间的碰撞会造成声震的观察,我们提出了一个学习重建视觉场景的连锁网络。虽然我们使用低成本和低功率的接触麦克风来探测震动,但我们的结果表明,从多式数据中学习可以让价格低廉的音频传感器进行国家估计。由于机器人使用容器的许多方式,我们认为这个盒子在机器人中将有许多应用。我们的项目网站是: bompox.cs.columbia.edu。