通过统一代表性和行为蒸馏,建立通俗化的体理学-任务系统 (A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation)

The rise of generalist large-scale models in natural language and vision has made us expect that a massive data-driven approach could achieve broader generalization in other domains such as continuous control. In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data. In order to align input-output (IO) interface among multiple tasks and diverse agent morphologies while preserving essential 3D geometric relations, we introduce morphology-task graph, which treats observations, actions and goals/task in a unified graph representation. We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations with a minimal blueprint and hardware-accelerated simulator. Through efficient representation and architecture selection on MxT-Bench, we find out that a morphology-task graph representation coupled with Transformer architecture improves the multi-task performances compared to other baselines including recent discrete tokenization, and provides better prior knowledge for zero-shot transfer or sample efficiency in downstream multi-task imitation learning. Our work suggests large diverse offline datasets, unified IO representation, and policy representation and architecture selection through supervised learning form a promising approach for studying and advancing morphology-task generalization.

翻译：在自然语言和愿景中,普遍主义大规模模型的兴起使我们期望大规模的数据驱动方法能够在诸如连续控制等其他领域实现更广泛的概括化。在这项工作中,我们探索了一种方法,学习一种单一的政策,通过蒸馏大量精准的行为模拟数据,操纵各种形式的代理机构,解决各种任务。为了将多种任务和不同代理机构形态之间的投入-产出(IO)接口与保持基本的 3D 几何关系,我们引入了形态-任务图,该图将观察、行动和目标/任务放在统一的图表中处理。我们还开发了快速大规模行为生成的MxT-Bench,该方法支持以微小的蓝图和硬件加速模拟器的形式生成多种形态-任务组合的程序。为了在MxT-Bench中高效地代表和结构选择多种任务,我们发现,形态-任务图与变异结构相比,与其他基线(包括最近的离散象征性)相比,改善了多任务性业绩,我们为零平面、先期代表制化和跨下游结构学习提供了更好的先期知识。