Predicting high-fidelity future human poses, from a historically observed sequence, is decisive for intelligent robots to interact with humans. Deep end-to-end learning approaches, which typically train a generic pre-trained model on external datasets and then directly apply it to all test samples, emerge as the dominant solution to solve this issue. Despite encouraging progress, they remain non-optimal, as the unique properties (e.g., motion style, rhythm) of a specific sequence cannot be adapted. More generally, at test-time, once encountering unseen motion categories (out-of-distribution), the predicted poses tend to be unreliable. Motivated by this observation, we propose a novel test-time adaptation framework that leverages two self-supervised auxiliary tasks to help the primary forecasting network adapt to the test sequence. In the testing phase, our model can adjust the model parameters by several gradient updates to improve the generation quality. However, due to catastrophic forgetting, both auxiliary tasks typically tend to the low ability to automatically present the desired positive incentives for the final prediction performance. For this reason, we also propose a meta-auxiliary learning scheme for better adaptation. In terms of general setup, our approach obtains higher accuracy, and under two new experimental designs for out-of-distribution data (unseen subjects and categories), achieves significant improvements.
翻译:在智能机器人与人类交互方面,从历史观测序列中准确预测未来的高保真度人体姿势是至关重要的。目前主流的解决方案是采用端到端的深度学习方法,即在外部数据集上训练通用的预训练模型,并直接将其应用于所有测试样例。尽管取得了令人鼓舞的进展,但仍然存在非最优的情况,因为特定序列的独特属性(例如动作风格、节奏等)无法适应。更普遍地,在测试时,一旦遇到看不见的动作类别(分布之外的数据),预测的姿势往往会不可靠。出于此目的,我们提出了一种新的测试时间适应框架,利用了两个自监督的辅助任务,帮助主要的预测网络适应测试序列。在测试阶段,我们的模型可以通过多个梯度更新来调整模型参数,以提高生成质量。然而,由于灾难性遗忘,两个辅助任务通常趋向于自我监督的能力较弱,无法自动呈现出最终预测性能的期望正向激励。因此,我们还提出了一种元辅助学习方案以实现更好的适应性。在通用设置方面,我们的方法取得了更高的精度,并在两个新的针对分布之外数据的实验设计中实现了显着的改进(看不见的主题和类别)。