Abugida refers to a phonogram writing system where each syllable is represented using a single consonant or typographic ligature, along with a default vowel or optional diacritic(s) to denote other vowels. However, texting in these languages has some unique challenges in spite of the advent of devices with soft keyboard supporting custom key layouts. The number of characters in these languages is large enough to require characters to be spread over multiple views in the layout. Having to switch between views many times to type a single word hinders the natural thought process. This prevents popular usage of native keyboard layouts. On the other hand, supporting romanized scripts (native words transcribed using Latin characters) with language model based suggestions is also set back by the lack of uniform romanization rules. To this end, we propose a disambiguation algorithm and showcase its usefulness in two novel mutually non-exclusive input methods for languages natively using the abugida writing system: (a) disambiguation of ambiguous input for abugida scripts, and (b) disambiguation of word variants in romanized scripts. We benchmark these approaches using public datasets, and show an improvement in typing speed by 19.49%, 25.13%, and 14.89%, in Hindi, Bengali, and Thai, respectively, using Ambiguous Input, owing to the human ease of locating keys combined with the efficiency of our inference method. Our Word Variant Disambiguation (WDA) maps valid variants of romanized words, previously treated as Out-of-Vocab, to a vocabulary of 100k words with high accuracy, leading to an increase in Error Correction F1 score by 10.03% and Next Word Prediction (NWP) by 62.50% on average.
翻译:Buggida 指的是一个声频谱写字系统, 每一个音频都使用单一调或排字的缩写符号, 并使用默认的元音或可选的斜体字来表示其他元音。 然而, 尽管使用软键盘安装了支持自定义键布局的装置, 这些语言的文字数量很大, 足以要求字符在布局的多个视图中传播。 需要多次转换观点以输入一个单词来阻碍自然思维进程。 这阻碍了本地键盘布局的流行使用。 另一方面, 支持基于语言的罗马化脚本( 默认的元音符或可选的斜体字) 以基于语言的建议也因缺乏统一的罗马化规则而出现一些独特的挑战 。 为此, 我们提出一个模糊的算法, 并用两种新颖的非排斥性输入语言的输入方法来显示其有用性 。 (a) 模糊性输入 abugida 脚本的变量, 和 (b) 避免使用本地键盘调的流行性 。 a 14.03) 支持以语言的缩略性格式的缩略性, 将25- bribridealalalalalalalalalalalalalalalalalalalalalation 方法, lade dislation 。