We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.
翻译:我们介绍了只用一种语言进行捷克数据培训的罗贝塔语言代表模式罗贝茨奇。罗贝塔是一种以变异器为基础的严格优化的预培训方法。我们表明,罗贝茨奇的多语种和经过捷克培训的背景语言代表模式大大优于同等规模的多语种和捷克语言代表模式,在所有五项经评估的国家语言方案任务中都超过了目前的最新水平,并在其中四项任务中达到了最先进的成果。罗贝茨模式在https://hdl.handle.net/11134/1-3691和https://huggingface.co/ufal/robeczech-base上公开发布。