For robots to truly collaborate and assist humans, they must understand not only logic and instructions, but also the subtle emotions, aesthetics, and feelings that define our humanity. Human art and aesthetics are among the most elusive concepts-often difficult even for people to articulate-and without grasping these fundamentals, robots will be unable to help in many spheres of daily life. Consider the long-promised robotic butler: automating domestic chores demands more than motion planning. It requires an internal model of cleanliness and tidiness-a challenge largely unexplored by AI. To bridge this gap, we propose an approach that equips domestic robots to perform simple tidying tasks via knolling, the practice of arranging scattered items into neat, space-efficient layouts. Unlike the uniformity of industrial settings, household environments feature diverse objects and highly subjective notions of tidiness. Drawing inspiration from NLP, we treat knolling as a sequential prediction problem and employ a transformer based model to forecast each object's placement. Our method learns a generalizable concept of tidiness, generates diverse solutions adaptable to varying object sets, and incorporates human preferences for personalized arrangements. This work represents a step forward in building robots that internalize human aesthetic sense and can genuinely co-create in our living spaces.
翻译:要使机器人真正实现与人类的协作与辅助,它们不仅需要理解逻辑与指令,还必须领会那些定义我们人性的微妙情感、美学与感受。人类艺术与美学是最难以捉摸的概念之一——甚至对人类而言也常难以言表——若无法把握这些基本要素,机器人将无法在日常生活的诸多领域提供帮助。以长期被寄予厚望的机器人管家为例:实现家务自动化不仅需要运动规划,更需建立关于清洁与整洁的内在模型——这一挑战在人工智能领域尚未得到充分探索。为弥合这一差距,我们提出一种方法,使家用机器人能够通过“knolling”(将散乱物品整理为整齐且空间高效布局的实践)执行简单的整理任务。与工业环境的统一性不同,家庭环境包含多样化的物品及高度主观的整洁概念。受自然语言处理(NLP)启发,我们将knolling视为序列预测问题,并采用基于Transformer的模型来预测每个物体的摆放位置。该方法学习可泛化的整洁概念,生成适应不同物体组合的多样化解决方案,并融入人类偏好以实现个性化布局。本工作标志着在构建能够内化人类审美感知、真正实现与人类共同创造生活空间的机器人方面迈出了重要一步。