There has been a plethora of work towards improving robot perception and navigation, yet their application in hazardous environments, like during a fire or an earthquake, is still at a nascent stage. We hypothesize two key challenges here: first, it is difficult to replicate such scenarios in the real world, which is necessary for training and testing purposes. Second, current systems are not fully able to take advantage of the rich multi-modal data available in such hazardous environments. To address the first challenge, we propose to harness the enormous amount of visual content available in the form of movies and TV shows, and develop a dataset that can represent hazardous environments encountered in the real world. The data is annotated with high-level danger ratings for realistic disaster images, and corresponding keywords are provided that summarize the content of the scene. In response to the second challenge, we propose a multi-modal danger estimation pipeline for collaborative human-robot escape scenarios. Our Bayesian framework improves danger estimation by fusing information from robot's camera sensor and language inputs from the human. Furthermore, we augment the estimation module with a risk-aware planner that helps in identifying safer paths out of the dangerous environment. Through extensive simulations, we exhibit the advantages of our multi-modal perception framework that gets translated into tangible benefits such as higher success rate in a collaborative human-robot mission.
翻译:在改进机器人感知和导航方面已经做了大量的工作,然而,在危险的环境中,例如在火灾或地震期间,机器人感知和导航的应用仍然处于初级阶段。我们假设这里有两个关键挑战:第一,很难在真实世界复制这种情景,这是培训和测试所必需的。第二,目前的系统无法充分利用在这种危险环境中现有的丰富的多模式数据。为了应对第一个挑战,我们提议利用电影和电视节目形式中大量可用的视觉内容,并开发一个能够代表现实世界中遇到的危险环境的数据集。数据带有对现实灾害图像高度危险评级的附加说明,并提供相应的关键词,以总结现场内容。针对第二个挑战,我们提议为合作型人类机器人逃生情景建立一个多模式的危险估计管道。我们巴伊西亚框架通过从机器人的摄像传感器和人类语言投入中提取信息来改进危险估计。此外,我们用一个风险意识模型来扩大估算模块,帮助确定更安全的方法,将危险的灾害图像评级,从而将人类的优势转化为危险的环境的模型。通过一个广泛的模拟模型,将人类成功率转化为危险的模型。