This paper presents a manually annotated spelling error corpus for Amharic, lingua franca in Ethiopia. The corpus is designed to be used for the evaluation of spelling error detection and correction. The misspellings are tagged as non-word and real-word errors. In addition, the contextual information available in the corpus makes it useful in dealing with both types of spelling errors.
翻译:本文件是埃塞俄比亚阿姆哈拉语、通用语弗朗萨语拼写错误的人工说明,用于评估拼写错误的发现和校正,拼写错误被标记为非字错误和真字错误,此外,本文中提供的背景资料有助于处理两种拼写错误。