Recognizing irregular texts has been a challenging topic in text recognition. To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each part is a truncated text and is not meaningful by itself. These parts should be linked to represent the intended meaning. Thus, we propose a novel task that predicts the link between truncated texts. We conduct three tasks to detect the onomatopoeia region and capture its intended meaning: text detection, text recognition, and link prediction. Through extensive experiments, we analyze the characteristics of the COO. Our data and code are available at \url{https://github.com/ku21fan/COO-Comic-Onomatopoeia}.
翻译:承认非常规文本在文本识别方面是一个具有挑战性的议题。为了鼓励对这一专题的研究,我们提供了由日本漫画中的肿瘤文本组成的新型肿瘤漫画数据集(COO),COO有许多任意文本,例如极曲线、部分缩小的文本或任意放置的文本。此外,有些文本被分成若干部分。每个部分都是短曲文本,本身没有意义。这些部分应当与预期含义联系起来。因此,我们提议了一项新颖的任务,预测短曲文本之间的联系。我们执行三项任务,以探测肿瘤区域并捕捉其预期含义:文本检测、文本识别和链接预测。我们通过广泛的实验分析COO的特征。我们的数据和代码可在以下https://github.com/ku21fan/COO-Comic-Onotopoie}查阅。