AI-powered automatic camera scene detection mode is nowadays available in nearly any modern smartphone, though the problem of accurate scene prediction has not yet been addressed by the research community. This paper for the first time carefully defines this problem and proposes a novel Camera Scene Detection Dataset (CamSDD) containing more than 11K manually crawled images belonging to 30 different scene categories. We propose an efficient and NPU-friendly CNN model for this task that demonstrates a top-3 accuracy of 99.5% on this dataset and achieves more than 200 FPS on the recent mobile SoCs. An additional in-the-wild evaluation of the obtained solution is performed to analyze its performance and limitation in the real-world scenarios. The dataset and pre-trained models used in this paper are available on the project website.
翻译:目前,几乎在任何现代智能手机上都可以找到AI动力自动照相机现场探测模式,尽管研究界尚未解决准确的现场预测问题。本文首次仔细地界定了这一问题,并提议建立一个新型的相机场景探测数据集(CamSDD),其中包含属于30个不同场景类别的11K以上人工爬行图像。我们为这项任务提出了一个高效的、方便NPU的CNN模式,该模式显示该数据集上方3级的精确度为99.5%,并在最近移动的 SoCs上取得了200多份FPS。对获得的解决方案进行了额外的全方位评估,以分析其在现实世界情景中的性能和局限性。本文中使用的数据集和预先培训模型可在项目网站上查阅。