This paper describes Adam Mickiewicz University's (AMU) solution for the 4th Shared Task on SlavNER. The task involves the identification, categorization, and lemmatization of named entities in Slavic languages. Our approach involved exploring the use of foundation models for these tasks. In particular, we used models based on the popular BERT and T5 model architectures. Additionally, we used external datasets to further improve the quality of our models. Our solution obtained promising results, achieving high metrics scores in both tasks. We describe our approach and the results of our experiments in detail, showing that the method is effective for NER and lemmatization in Slavic languages. Additionally, our models for lemmatization will be available at: https://huggingface.co/amu-cai.
翻译:本文介绍了亚当·密茨凯维奇大学(AMU)参加第四届SlavNER共享任务的解决方案。该任务涉及Slavic语言中命名实体的识别、分类和词形归并。我们的方法涉及探索使用基础模型进行这些任务。特别是,我们使用了基于流行的BERT和T5模型架构的模型。此外,我们还使用外部数据集进一步提高了我们模型的质量。我们的解决方案取得了有希望的结果,在两个任务中均取得了高指标得分。我们详细描述了我们的方法和实验结果,表明该方法对于Slavic语言中的NER和词形归并是有效的。此外,我们的词形归并模型将可在以下网址获得:https://huggingface.co/amu-cai。