Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.
翻译:估计一个人的头部姿势是一个至关重要的问题,它有许多应用,例如协助对视估计、模拟关注、将3D模型安装到视频和进行面部对齐。传统上,头部姿势的计算方法是从目标面估计一些关键点,用一个中等的人体头型模型解决2D至3D对应问题。我们争辩说,这是一个脆弱的方法,因为它完全依赖里程碑式的探测性能、不相干的头型模型和一个特别的相容步骤。我们提出了一个优雅而有力的方法,通过在300W-LP(一个大型合成扩大的数据集)上培训一个多损失的卷发神经网络来确定其姿势。从图像强度中直接预测内在的Euler角度(断、投放和滚动),通过联合的硬质的形状分类和回归。我们介绍了关于常见的电动基准数据集的经验测试,这些基准数据集显示了最新的结果。我们用深度和开始的数据集测试方法,用深度和深度来缩小与最先进的深度模型的距离,从而缩小差距。我们将培训和测试作为前导的代码。