With many practical applications in human life, including manufacturing surveillance cameras, analyzing and processing customer behavior, many researchers are noticing face detection and head pose estimation on digital images. A large number of proposed deep learning models have state-of-the-art accuracy such as YOLO, SSD, MTCNN, solving the problem of face detection or HopeNet, FSA-Net, RankPose model used for head pose estimation problem. According to many state-of-the-art methods, the pipeline of this task consists of two parts, from face detection to head pose estimation. These two steps are completely independent and do not share information. This makes the model clear in setup but does not leverage most of the featured resources extracted in each model. In this paper, we proposed the Multitask-Net model with the motivation to leverage the features extracted from the face detection model, sharing them with the head pose estimation branch to improve accuracy. Also, with the variety of data, the Euler angle domain representing the face is large, our model can predict with results in the 360 Euler angle domain. Applying the multitask learning method, the Multitask-Net model can simultaneously predict the position and direction of the human head. To increase the ability to predict the head direction of the model, we change there presentation of the human face from the Euler angle to vectors of the Rotation matrix.
翻译:许多研究人员通过在人类生活中的许多实际应用,包括制造监视摄像头、分析和处理客户行为,注意到了面部检测,头部对数字图像进行了估计。许多提议的深层次学习模型具有最新准确性,如YOLO、SSD、MTCNN,解决了面部检测或HopeNet、FSA-Net、RankPose模型对头部应用的许多实际应用,从而产生了估计问题。根据许多最先进的方法,这一任务的管道包括两个部分,从面部检测到头部,构成估计。这两个步骤是完全独立的,并不共享信息。这让模型在设置中变得清晰,但没有利用每个模型所提取的大部分特有资源。在本文件中,我们提出了多塔斯克网络模型,其动机是利用从面部检测模型中提取的特征或HopeNet、FSA-Net、Rank-Pose模型用于头部估算分支,以提高准确性。此外,由于数据种类繁多,代表面部的Euler角度域范围是很大的,我们模型可以预测360 Euler 角度域的结果。应用多塔克学习方法,应用多塔克学习方法,但没有利用每个模型的大部分特征资源。我们从头部模型向Etal-hat-hat-hastical 向Emlum 预测人类方向。