As handwriting input becomes more prevalent, the large symbol inventory required to support Chinese handwriting recognition poses unique challenges. This paper describes how the Apple deep learning recognition system can accurately handle up to 30,000 Chinese characters while running in real-time across a range of mobile devices. To achieve acceptable accuracy, we paid particular attention to data collection conditions, representativeness of writing styles, and training regimen. We found that, with proper care, even larger inventories are within reach. Our experiments show that accuracy only degrades slowly as the inventory increases, as long as we use training data of sufficient quality and in sufficient quantity.
翻译:随着笔迹投入的日益普遍,支持中国笔迹识别所需的大量符号清单构成了独特的挑战。本文描述了苹果深层学习识别系统如何准确处理多达30,000个中国字符,同时在一系列移动设备上实时运行。为了达到可接受的准确性,我们特别注意数据收集条件、书写风格的代表性以及培训制度。我们发现,经过适当小心,甚至更多的清单都可望达到。我们的实验显示,只要我们使用质量和数量充足的培训数据,准确性只会随着库存的增加而缓慢下降。