This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
翻译:本文介绍了为法国古典文学建立附加说明的文稿和培训模式的过程,重点是戏剧,特别是诗中的喜剧,最初是作为Cafiero和Cafiero和Camps[2019]中介绍的测量分析的初步步骤而开发的。使用基于神经网络的最近浸泡剂和通用报告格式调试器,可以超越目前关于内部测试的先进技术,在外部测试(即20c.novels)中证明是稳健的。