As textual attributes like font are core design elements of document format and page style, automatic attributes recognition favor comprehensive practical applications. Existing approaches already yield satisfactory performance in differentiating disparate attributes, but they still suffer in distinguishing similar attributes with only subtle difference. Moreover, their performance drop severely in real-world scenarios where unexpected and obvious imaging distortions appear. In this paper, we aim to tackle these problems by proposing TaCo, a contrastive framework for textual attribute recognition tailored toward the most common document scenes. Specifically, TaCo leverages contrastive learning to dispel the ambiguity trap arising from vague and open-ended attributes. To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential. Extensive experiments show that TaCo surpasses the supervised counterparts and advances the state-of-the-art remarkably on multiple attribute recognition tasks. Online services of TaCo will be made available.
翻译:由于字体等文字属性是文档格式和页面样式的核心设计要素,自动属性识别有利于全面实用应用。现有方法在区别不同属性方面已经取得了令人满意的业绩,但在区别相似属性方面仍然遭受着细微差别。此外,在出现出乎意料和明显的图像扭曲的现实世界情景中,它们的表现严重下降。在本文件中,我们的目标是通过提出TaCo来解决这些问题,TaCo是一个针对最常见的文件场景的文本属性识别对比框架。具体地说,TaCo利用对比式学习来消除模糊和开放属性产生的模糊陷阱。为了实现这一目标,我们从三个角度设计学习模式:(1) 生成属性视图,(2) 提取微妙但关键的细节,(3) 利用有价值的观点组合来学习,充分释放培训前的潜力。广泛的实验表明,TaCooo超越了受监督的对应方,并在多个属性识别任务上取得了显著的进展。TaCoo将在线服务提供。