Comparing different age estimation methods poses a challenge due to the unreliability of published results stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. This paper identifies two trivial, yet persistent issues with the currently used evaluation protocol and describes how to resolve them. We describe our evaluation protocol in detail and provide specific examples of how the protocol should be used. We utilize the protocol to offer an extensive comparative analysis for state-of-the-art facial age estimation methods. Surprisingly, we find that the performance differences between the methods are negligible compared to the effect of other factors, such as facial alignment, facial coverage, image resolution, model architecture, or the amount of data used for pretraining. We use the gained insights to propose using FaRL as the backbone model and demonstrate its efficiency. The results emphasize the importance of consistent data preprocessing practices for reliable and meaningful comparisons. We make our source code public at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.
翻译:暂无翻译