Foundation models and large language models have shown immense human-like understanding and capabilities for generating text and digital media. However, foundation models that can freely sense, interact, and actuate the physical world like in the digital domain is far from being realized. This is due to a number of challenges including: 1) being constrained to the types of static devices and sensors deployed, 2) events often being localized to one part of a large space, and 3) requiring dense and deployments of devices to achieve full coverage. As a critical step towards enabling foundation models to successfully and freely interact with the physical environment, we propose RASP, a modular and reconfigurable sensing and actuation platform that allows drones to autonomously swap onboard sensors and actuators in only $25$ seconds, allowing a single drone to quickly adapt to a diverse range of tasks. We demonstrate through real smart home deployments that RASP enables FMs and LLMs to complete diverse tasks up to $85\%$ more successfully by allowing them to target specific areas with specific sensors and actuators on-the-fly.
翻译:暂无翻译