State-of-the-art poetry generation systems are often complex. They either consist of task-specific model pipelines, incorporate prior knowledge in the form of manually created constraints or both. In contrast, end-to-end models would not suffer from the overhead of having to model prior knowledge and could learn the nuances of poetry from data alone, reducing the degree of human supervision required. In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We identify and address lack of training data and mismatching tokenization algorithms as possible limitations of past attempts. In particular, we successfully pre-train and release ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans. In addition, we analyze its runtime performance and introspect the model's understanding of style conditions. We make our code, models, and datasets publicly available.
翻译:最先进的诗歌生成系统往往非常复杂,它们要么由特定任务的模式管道组成,以人工创造的限制形式或两者兼而有之,吸收先前的知识。相反,端到端模式不会因为必须先学先学知识而受到影响,而只从数据中学习诗歌的细微差别,从而降低人类所需的监督程度。在这项工作中,我们调查以诸如押韵、仪表和通俗等风格为条件的端到端的诗生成。我们发现并解决缺乏培训数据和代谢算法的问题,作为过去尝试的可能限制。特别是,我们成功地预演和发布GPT5,一种新的无符号解码器只用的语言模型,并微调它放在一个庞大的自定义的英语和德语四边文中,用我们的风格作注释。我们显示,GPT5优于MT5、BY5、GPT2和ChattGPT等其他模型,同时提高参数的效率,并比人类更优异。此外,我们分析了它的运行时间性表现和内向式模型,我们分析了我们现有的数据样式。