Age estimation is a classic learning problem in computer vision. Many larger and deeper CNNs have been proposed with promising performance, such as AlexNet, VggNet, GoogLeNet and ResNet. However, these models are not practical for the embedded/mobile devices. Recently, MobileNets and ShuffleNets have been proposed to reduce the number of parameters, yielding lightweight models. However, their representation has been weakened because of the adoption of depth-wise separable convolution. In this work, we investigate the limits of compact model for small-scale image and propose an extremely Compact yet efficient Cascade Context-based Age Estimation model(C3AE). This model possesses only 1/9 and 1/2000 parameters compared with MobileNets/ShuffleNets and VggNet, while achieves competitive performance. In particular, we re-define age estimation problem by two-points representation, which is implemented by a cascade model. Moreover, to fully utilize the facial context information, multi-branch CNN network is proposed to aggregate multi-scale context. Experiments are carried out on three age estimation datasets. The state-of-the-art performance on compact model has been achieved with a relatively large margin.
翻译:年龄估计是计算机愿景中一个典型的学习问题。许多规模更大、更深层次的CNN都提出了有良好表现的功能,如AlexNet、VggNet、GoogLeNet和ResNet。然而,这些模型对于嵌入/移动设备并不实用。最近,移动Nets和ShuffleNets都提出了减少参数数量的建议,产生了轻量模型。然而,由于采用了深度和可分离的相容变换,它们的代表性已经减弱了。在这项工作中,我们调查了小规模图像的紧凑模型的局限性,并提议了一个极为紧凑而高效的基于背景的年龄估计模型(C3AE)。与移动Nets/ShuffleNets和VggNets相比,这一模型只拥有1/9和1/2000参数,而这种模型在取得竞争性的性能的同时,仅具有1/9和1/2000参数。特别是,我们用双点代表法重新界定年龄估计问题,这是由级联模型执行的。此外,为了充分利用面貌背景信息,我们提议将多波纹CNN网络推向综合的多尺度背景。在三个年龄估计数据模型上进行了实验。在比较大的进度上实现了。在三个模型上取得了。在比较的进度上实现了。