We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at https://actiongpt.github.io
翻译:我们引入了Action-GPT,这是一个将大语言模型(LLMs)纳入基于文本的行动生成模型的插座和游戏框架;当前运动捕获数据集中的行动短语包含最低限度和到点的信息;通过为LLMs精心制作提示,我们产生更丰富和精细的动作描述;我们表明,利用这些详细描述而不是最初的行动短语可以更好地调整文本和运动空间;我们的实验表明,通过最近的文本到动作模型产生的综合动议的质量在质和量上都有改进。守则、预先培训的模型和样本视频将在https://actiongpt.gthub.上提供。