We introduce a new benchmark for assessing the quality of text-to-text models for Polish. The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering. In particular, since summarization and question answering lack benchmark datasets for the Polish language, we describe their construction and make them publicly available. Additionally, we present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective. Unsupervised denoising pre-training is performed efficiently by initializing the model weights with a multi-lingual T5 (mT5) counterpart. We evaluate the performance of plT5, mT5, Polish BART (plBART), and Polish GPT-2 (papuGaPT2). The plT5 scores top on all of these tasks except summarization, where plBART is best. In general (except for summarization), the larger the model, the better the results. The encoder-decoder architectures prove to be better than the decoder-only equivalent.
翻译:我们引入了评估波兰文本到文本模型质量的新基准,该基准由多种任务和数据集组成:为文本到文本、正版翻译、简称和回答问题而调整的KLEJ基准。特别是,由于总和和问题回答缺乏波兰语的基准数据集,我们描述其构建并公布这些数据集。此外,我们为波兰提供了plT5――一个通用文本到文本模型,可以对各种自然语言处理任务进行微调,具有单一的培训目标。通过以多语言 T5 (mT5)对应方初始化模型加权,实现了未经监督的取消培训前工作效率。我们评估了plT5、mT5、波兰BART(plBART)和波兰GPT-2(papuGaPT2)的性能,所有这些任务中,plT5级文本排名最高,只有简称最优。一般来说(除简称外),模型规模更大,结果优于模型。