Stroke is the basic element of Chinese character and stroke extraction has been an important and long-standing endeavor. Existing stroke extraction methods are often handcrafted and highly depend on domain expertise due to the limited training data. Moreover, there are no standardized benchmarks to provide a fair comparison between different stroke extraction methods, which, we believe, is a major impediment to the development of Chinese character stroke understanding and related tasks. In this work, we present the first public available Chinese Character Stroke Extraction (CCSE) benchmark, with two new large-scale datasets: Kaiti CCSE (CCSE-Kai) and Handwritten CCSE (CCSE-HW). With the large-scale datasets, we hope to leverage the representation power of deep models such as CNNs to solve the stroke extraction task, which, however, remains an open question. To this end, we turn the stroke extraction problem into a stroke instance segmentation problem. Using the proposed datasets to train a stroke instance segmentation model, we surpass previous methods by a large margin. Moreover, the models trained with the proposed datasets benefit the downstream font generation and handwritten aesthetic assessment tasks. We hope these benchmark results can facilitate further research. The source code and datasets are publicly available at: https://github.com/lizhaoliu-Lec/CCSE.
翻译:中风是中国人性特征的基本元素,中风抽取是中国人性特征的基本元素,中风抽取是一项重要而长期的努力。现有的中风抽取方法往往是手工制作的,而且由于培训数据有限,高度依赖领域专长。此外,没有标准化的基准来对不同中风抽取方法进行公平的比较,我们认为,这是中国人性抽取理解和相关任务的主要障碍。在这项工作中,我们提出了第一个公开的中风抽取基准,有两个新的大型数据集:凯蒂·CCSE(CCSE- Kai)和手写CCSE(CCSE-HW)。在大型数据集中,我们希望利用CNNs等深型模型的代表性力量解决中风抽取任务,然而,这仍然是个未决问题。我们为此将中风抽取问题变成中风例分解问题。我们利用拟议的数据集来训练中风例分解模型,我们大大超越了以往的方法。此外,经过培训的模型将有利于下游字体生成和手写成型的CSBSE/SEMASE任务。我们希望这些基准结果可以用于下游/SEDSE/SE/SE/SE/SEA。我们现有的数据源。