The literature shows outstanding capabilities for CNNs in event recognition in images. However, fewer attempts are made to analyze the potential causes behind the decisions of the models and exploring whether the predictions are based on event-salient objects or regions? To explore this important aspect of event recognition, in this work, we propose an explainable event recognition framework relying on Grad-CAM and an Xception architecture-based CNN model. Experiments are conducted on three large-scale datasets covering a diversified set of natural disasters, social, and sports events. Overall, the model showed outstanding generalization capabilities obtaining overall F1-scores of 0.91, 0.94, and 0.97 on natural disasters, social, and sports events, respectively. Moreover, for subjective analysis of activation maps generated through Grad-CAM for the predicted samples of the model, a crowdsourcing study is conducted to analyze whether the model's predictions are based on event-related objects/regions or not? The results of the study indicate that 78%, 84%, and 78% of the model decisions on natural disasters, sports, and social events datasets, respectively, are based onevent-related objects or regions.
翻译:文献显示有线电视新闻网在图像识别时的杰出能力;然而,在分析模型决定背后的潜在原因和探索预测是否基于事件性对象或区域方面,尝试分析模型的潜在原因和探索这些预测是否基于事件性对象或区域?为了探讨事件识别的这一重要方面,我们在这项工作中提议了一个依靠格拉德-卡拉姆和Xception结构型有线电视新闻网模型的可解释事件识别框架;对涵盖多种自然灾害、社会和体育活动的三个大型数据集进行了实验;总体而言,模型显示了获得分别关于自然灾害、社会活动和体育活动的F1核心(分别为0.91、0.94和0.97)的突出一般能力;此外,为了对通过格拉德-卡拉姆生成的启动地图进行主观分析,对模型的预测样品进行了群集研究,以分析模型预测是否基于事件性对象/区域?研究结果表明,关于自然灾害、体育和社会事件数据集的模型分别有78%、84%和78%的示范决定是基于与事件有关的物体或区域。