We present HY-Motion 1.0, a series of state-of-the-art, large-scale, motion generation models capable of generating 3D human motions from textual descriptions. HY-Motion 1.0 represents the first successful attempt to scale up Diffusion Transformer (DiT)-based flow matching models to the billion-parameter scale within the motion generation domain, delivering instruction-following capabilities that significantly outperform current open-source benchmarks. Uniquely, we introduce a comprehensive, full-stage training paradigm -- including large-scale pretraining on over 3,000 hours of motion data, high-quality fine-tuning on 400 hours of curated data, and reinforcement learning from both human feedback and reward models -- to ensure precise alignment with the text instruction and high motion quality. This framework is supported by our meticulous data processing pipeline, which performs rigorous motion cleaning and captioning. Consequently, our model achieves the most extensive coverage, spanning over 200 motion categories across 6 major classes. We release HY-Motion 1.0 to the open-source community to foster future research and accelerate the transition of 3D human motion generation models towards commercial maturity.
翻译:我们提出了HY-Motion 1.0,这是一系列最先进的大规模动作生成模型,能够根据文本描述生成3D人体动作。HY-Motion 1.0代表了在动作生成领域首次成功将基于扩散Transformer(DiT)的流匹配模型扩展到十亿参数规模,其指令跟随能力显著超越了当前的开源基准。我们独特地引入了一个全面的全阶段训练范式——包括在超过3000小时动作数据上进行大规模预训练、在400小时精选数据上进行高质量微调,以及基于人类反馈和奖励模型的强化学习——以确保与文本指令的精确对齐和高动作质量。该框架得到了我们精心设计的数据处理流程的支持,该流程执行了严格的运动清洗与标注。因此,我们的模型实现了最广泛的覆盖范围,涵盖了6个大类下的超过200个动作类别。我们将HY-Motion 1.0开源给社区,以促进未来研究并加速3D人体动作生成模型向商业成熟度的过渡。