In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and similar datasets using a variety of classification models, including CNN based ResNets and self-attention based Vision Transformer (ViT). Our results give new insights into the generalization and transfer learning properties of ViT models on substantially different domain datasets from those used for the upstream pre-training, including the influence of batch and image size in their training. Additionally, we share our dataset, source-code, pre-trained checkpoints and results, as Animesion, the first end-to-end framework for large-scale anime character recognition: https://github.com/arkel23/animesion
翻译:在这项工作中,我们处理动因字符识别这一具有挑战性的问题。动因是指日本国内制作的动画以及从中衍生或启发的工作。为此,我们介绍了DAF:re(DanbooruAnime Faces:revamped),一个大型、众源、长尾数据集,其图像分布在3 000多个类中,近500K级。此外,我们利用各种分类模型,包括有线电视新闻网的ResNets和以自我注意为基础的视觉变异器(VT),对DAF:re(DanboooruAnimeime Faces:revaped)进行DAF:re和类似数据集的实验。我们的结果使人们重新了解了VAF:在与上游预培训中使用的非常不同的域数据集方面,包括批量和图像大小的影响,以及我们分享的数据集、源码、预先训练的检查站和结果,作为Aimement,第一个大规模性特征识别端到端框架:https://github.com/arkel23/animementionion。