The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as optional runs. In the multimodal method, we rely on a supervised multimodal bitransformer model that combines textual and visual features in an early fusion, achieving a micro F1-score of .859 on the development data set. For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. For image-based solutions, we employed multiple deep models, pre-trained on both, the ImageNet and places data sets, individually and combined in an early fusion achieving F1-scores of .816 and .805 on the development set, respectively.
翻译:本文介绍了我们为MediaEval 2020 与洪水有关的多媒体任务提出的解决方案,其目的是分析和检测在Twitter上共享的多媒体内容中的洪水事件。我们总共提出了四种不同的解决方案,包括将强制运行的文本和视觉信息相结合的多模式解决方案,以及三个单一模式图像和基于文本的解决方案。在多式联运方法中,我们依赖一种监督的多式联运双转移模式,该模式将早期聚合的文本和视觉特征结合起来,在开发数据集上实现了859个微小F1核心。在基于文本的洪水事件探测中,我们使用一个变压器网络(即预先培训的意大利BERT模型)实现了853个F1核心。对于基于图像的解决方案,我们采用了多种深度模型,即预先培训的图像网络和数据集,分别单独和合并在开发数据集上实现F1核心816和805的早期融合中。