We introduce The Boombox, a container that uses acoustic vibrations to reconstruct an image of its inside contents. When an object interacts with the container, they produce small acoustic vibrations. The exact vibration characteristics depend on the physical properties of the box and the object. We demonstrate how to use this incidental signal in order to predict visual structure. After learning, our approach remains effective even when a camera cannot view inside the box. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multi-modal data enables us to transform cheap acoustic sensors into rich visual sensors. Due to the ubiquity of containers, we believe integrating perception capabilities into them will enable new applications in human-computer interaction and robotics. Our project website is at: boombox.cs.columbia.edu
翻译:我们引入了Bombox, 这个容器使用声震来重建其内装物的图像。 当一个物体与容器发生相互作用时, 它们会产生小声震动。 确切的振动特性取决于盒子和物体的物理特性。 我们演示如何使用这个附带信号来预测视觉结构。 在学习后, 我们的方法仍然有效, 即使相机无法在盒子内查看。 尽管我们使用低成本和低功率的接触麦克风来探测震动, 我们的结果显示, 学习多式数据可以让我们把廉价的声感应器转换成丰富的视觉感应器。 由于容器的无处不在, 我们相信, 将感知能力结合到它们中, 将会在人体- 计算机互动和机器人中带来新的应用。 我们的项目网站是: bompox. cs. columbia. edu。