This paper addresses the critical need for automated crack detection in the preservation of cultural heritage through semantic segmentation. We present a comparative study of U-Net architectures, using various convolutional neural network (CNN) encoders, for pixel-level crack identification on statues and monuments. A comparative quantitative evaluation is performed on the test set of the OmniCrack30k dataset [1] using popular segmentation metrics including Mean Intersection over Union (mIoU), Dice coefficient, and Jaccard index. This is complemented by an out-of-distribution qualitative evaluation on an unlabeled test set of real-world cracked statues and monuments. Our findings provide valuable insights into the capabilities of different CNN- based encoders for fine-grained crack segmentation. We show that the models exhibit promising generalization capabilities to unseen cultural heritage contexts, despite never having been explicitly trained on images of statues or monuments.
翻译:本文通过语义分割技术,探讨了文化遗产保护中自动化裂缝检测的迫切需求。我们提出了一项关于U-Net架构的比较研究,采用多种卷积神经网络(CNN)编码器,对雕像和纪念碑进行像素级裂缝识别。在OmniCrack30k数据集[1]的测试集上,使用包括平均交并比(mIoU)、Dice系数和Jaccard指数在内的常用分割指标进行了定量评估。此外,还通过一个未标记的真实世界裂缝雕像和纪念碑测试集进行了分布外定性评估。我们的研究结果为不同基于CNN的编码器在细粒度裂缝分割方面的能力提供了有价值的见解。研究表明,尽管模型从未在雕像或纪念碑图像上进行过明确训练,但在未见过的文化遗产场景中展现出良好的泛化能力。