In this paper, we describe a multi-modal search system designed to search old archaeological books and reports. This corpus is digitally available as scanned PDFs, but varies widely in the quality of scans. Our pipeline, designed for multi-modal archaeological documents, extracts and indexes text, images (classified into maps, photos, layouts, and others), and tables. We evaluated different retrieval strategies, including keyword-based search, embedding-based models, and a hybrid approach that selects optimal results from both modalities. We report and analyze our preliminary results and discuss future work in this exciting vertical.
翻译:本文介绍了一种多模态检索系统,旨在搜索古老的考古学书籍与报告。该语料库以扫描PDF格式数字化提供,但扫描质量差异显著。我们针对多模态考古文档设计的流程,提取并索引文本、图像(分类为地图、照片、布局图及其他类型)以及表格。我们评估了多种检索策略,包括基于关键词的搜索、基于嵌入的模型,以及从两种模态中选取最优结果的混合方法。我们报告并分析了初步结果,并探讨了这一前沿垂直领域的未来工作方向。