Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.
翻译:历史手稿处理面临着有限的注释训练数据和新类别出现等挑战。为应对这些挑战,我们提出了一种基于一次学习的文本检测(OTS)方法,其能够准确、可靠地发现具有噪声的手稿中的文本信息。从学习原理出发,我们引入一个空间对齐模块,通过根据一个样本学习手稿中最有区分度的空间位置来定位和识别待检测的目标。特别地,我们提出了一个名为轮廓损失的新的损失函数,能够提高嵌入空间的区分度。基于我们所创建的包含古代东巴文字的新手稿数据集,并结合已有的公共数据集进行了评估,我们证明了OTS在一次学习文本检测方面的优越性。总的来说,我们所提出的方法在历史手稿文本检测领域具有良好的应用前景。