This report presents our winner solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves an overall word accuracy of 69.73% when considering both in-vocabulary and out-of-vocabulary words.
翻译:本报告介绍了我们对ECCV 2022号挑战的优胜者解决方案,即《词汇外场景文字理解:单词识别 》 ( OOOV-ST) 挑战。 这一挑战是在ECCV 2022 万物文字研讨会的背景下举行的,该研讨会旨在从自然场景图像中提取单词。 在竞赛中,我们首先在合成数据集上对SCATTER系统进行预培训,然后用数据扩增来微调成套培训模式。 同时,另外两个模型专门为长词和垂直文本进行了培训。 最后,我们把不同模型的产出与不同层次、不同骨干和不同种子结合起来,作为最终结果。 我们的解决方案在考虑词汇和词汇外语言时,实现了69.73%的总体字精度。