This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves a word accuracy of 59.45\% when considering out-of-vocabulary words only.
翻译:本报告介绍了我们对ECCV 2022年关于 " 校外场外文字理解 " (OOOV-ST)挑战的第二点解决办法: " 分解字识别 " 。这项挑战是在ECCV 2022年关于 " 万物的文字 " (TEE)研讨会的背景下进行的,该研讨会旨在从自然场景图像中提取遗语。在竞赛中,我们首先在合成数据集上进行预培训,然后用数据增强来微调培训成套培训模式。与此同时,另外两个模型专门为长长和垂直文本进行了培训。最后,我们把不同模型的产出与不同层次、不同骨架和不同种子结合起来,我们的解决办法在只考虑词汇外的字眼时达到了5-945字的准确度。