This paper presents the challenges in creating and managing large parallel corpora of 12 major Indian languages (which is soon to be extended to 23 languages) as part of a major consortium project funded by the Department of Information Technology (DIT), Govt. of India, and running parallel in 10 different universities of India. In order to efficiently manage the process of creation and dissemination of these huge corpora, the web-based (with a reduced stand-alone version also) annotation tool ILCIANN (Indian Languages Corpora Initiative Annotation Tool) has been developed. It was primarily developed for the POS annotation as well as the management of the corpus annotation by people with differing amount of competence and at locations physically situated far apart. In order to maintain consistency and standards in the creation of the corpora, it was necessary that everyone works on a common platform which was provided by this tool.
翻译:本文件介绍了在创建和管理由12种主要印度语言组成的大型平行公司(不久将扩大到23种语言)方面的挑战,这是由印度政府信息技术部资助的大型财团项目的一部分,在印度10所不同的大学平行进行,为了有效管理这些庞大公司的创建和传播过程,已经开发了以网络为基础的说明工具ILCIANN(印度语言公司倡议注释工具),主要为POS注解以及由能力程度不同的人和在距离很远的物理地点管理物质批注,为了在创建公司的过程中保持一致性和标准,必须让每个人都在由这一工具提供的共同平台上工作。