A growing priority in the study of Baltic-Finnic languages of the Republic of Karelia has been the methods and tools of corpus linguistics. Since 2016, linguists, mathematicians, and programmers at the Karelian Research Centre have been working with the Open Corpus of the Veps and Karelian Languages (VepKar), which is an extension of the Veps Corpus created in 2009. The VepKar corpus comprises texts in Karelian and Veps, multifunctional dictionaries linked to them, and software with an advanced system of search using various criteria of the texts (language, genre, etc.) and numerous linguistic categories (lexical and grammatical search in texts was implemented thanks to the generator of word forms that we created earlier). A corpus of 3000 texts was compiled, texts were uploaded and marked up, the system for classifying texts into languages, dialects, types and genres was introduced, and the word-form generator was created. Future plans include developing a speech module for working with audio recordings and a syntactic tagging module using morphological analysis outputs. Owing to continuous functional advancements in the corpus manager and ongoing VepKar enrichment with new material and text markup, users can handle a wide range of scientific and applied tasks. In creating the universal national VepKar corpus, its developers and managers strive to preserve and exhibit as fully as possible the state of the Veps and Karelian languages in the 19th-21st centuries.
翻译:在卡雷利亚共和国波罗的海-芬兰语言研究中,一个日益优先的事项是语言本体学的方法和工具,自2016年以来,卡雷利安研究中心的语言学家、数学家和编程员与Veps和Karelian语言(VepKar)开放公司(VepKar)合作,这是2009年创建的Veps Corpus(VepKar)的扩展版。 VepKar文由Karelian和Veps的文本、与其相连的多功能词典和软件组成,具有先进的搜索系统,使用各种文本标准(语言、genre等)和多种语言类别(由于我们早些时候创建了文字格式的生成者,对文本进行了灵活和语法搜索)和语法搜索(由于我们早些时候创建了文字格式的生成者,对文本进行了灵活和语法搜索),汇编了3000个文本,对文本进行了上传和标记,引入了文字分类系统,并创建了文字格式生成器。未来计划包括开发一个语音模块,以使用各种语言(语言、语言等)和合成标记模块模块模块,使用形态分析结果,在19Karps的用户中,并全面处理了Vlial-Karep Stal-realalalmareal和Vralmasorim Stal-stralmastralmaxal 和Vralmaimstalmaimstalmaxalmaxalmaxalmax,可以将新的文本,可以将新的文本,在Vrmas