Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer's style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters.
翻译:生成具有特定书写风格的手写文本合成图像是一项具有挑战性的任务,特别是在处理未见过的风格和新单词,尤其是其中包含训练时很少遇到的字符时。虽然最近已经有生成模型在模拟作者的风格方面取得了进展,但对于少见字符的泛化能力尚未得到关注。在本文中,我们设计了一种基于Transformer模型的小样本 styled handwritten text generation 模型,并关注于获取基于文字和书写风格的稳健且信息化的表示。尤其是,我们提出了一种将文本内容表示为由标准GNU Unifont字形编写成的符号图像的一系列密集向量的新颖表示法。这种策略更适合于生成在训练过程中只见过几次,但可能与常见字符共享视觉细节的字符。至于书写风格,我们通过在大规模合成数据集上进行特定的预训练,获得了对未见过作者书法的稳健表示。定量和定性结果表明,与依赖于字符的独立 one-hot 编码的现有方法相比,我们的方法更有效地生成了具有少见字符和新型式的单词。