Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of India. Despite having a significant number of speakers, there has been virtually no language resource (LR) or language technology (LT) developed for the language, mainly because of its status as a non-scheduled language. The present paper describes an attempt to develop an annotated corpus of Magahi. The data is mainly taken from a couple of blogs in Magahi, some collection of stories in Magahi and the recordings of conversation in Magahi and it is annotated at the POS level using BIS tagset.
翻译:Magahi语是印度-阿利扬语,主要在印度东部地区使用,尽管有许多人讲,但实际上没有为该语言开发语言资源或语言技术(LR),这主要是因为该语言是非预定语言,本文件描述了开发一个附加说明的Magahi文集的尝试,数据主要取自Magahi的一些博客、Magahi语的一些故事集和Magahi语对话录音集,在POS一级使用BIS标记塞附加说明。