This paper proposes MotionScript, a motion-to-text conversion algorithm and natural language representation for human body motions. MotionScript provides more detailed and accurate descriptions of human body movements compared to previous natural language methods. Most motion datasets focus on basic, well-defined actions, with limited variation in expression (e.g., sitting, walking, dribbling a ball). But for expressive actions that contain a diversity of movements in the class (e.g. being sad, dancing), or for actions outside the domain of standard motion capture datasets (e.g. stylistic walking, sign-language, interactions with animals), more specific and granular natural language descriptions are needed. Our proposed MotionScript descriptions differ from existing natural language representations in that it provides detailed descriptions in natural language rather than simple action labels or generalized captions. To the best of our knowledge, this is the first attempt at translating 3D motions to natural language descriptions without requiring training data. Our experiments demonstrate that MotionScript descriptions, when applied to text-to-motion tasks, enable large language models to generate complex, previously unseen motions. Additional examples, dataset, and code can be accessed at https://pjyazdian.github.io/MotionScript
翻译:暂无翻译