Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.
翻译:历史手稿处理存在数据标注有限和新类别出现等挑战。为了应对这一挑战,我们提出了一种新颖的基于一次学习的文本定位(OTS)方法,使用仅有一个标注样本就能准确可靠地定位新字符。受认知研究启发,我们引入了空间对齐模块,基于一张支持图像在查询图像中寻找、关注和学习最具区分性的空间区域。特别地,由于低资源定位任务常常面临实例失衡问题,我们提出了一种称为Torus Loss的新型损失函数,可以使距离度量的嵌入空间更具区分性。我们的方法非常高效,仅需要几个训练样本,同时具有处理新字符和符号的显著能力。为了增强数据集的多样性,我们创建了一个包括古代东巴象形文字的新手稿数据集。我们在公开可用的VML-HD、TKH、NC数据集和新提出的DBH数据集上进行了实验。实验结果表明,OTS在一次文本定位中优于现有的方法。总的来说,我们提出的方法在历史手稿文本定位领域具有很大的应用前景。