The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup algorithm, which outperforms UTF-8 validation routines used in many libraries and languages by more than 10 times using commonly available SIMD instructions. To ensure reproducibility, our work is freely available as open source software.
翻译:大部分文本都储存在UTF-8, 必须在摄入时验证。 我们展示了搜索算法,它比许多图书馆和语言中使用的UTF-8验证程序成功10倍以上,使用了通用的SIMD 指令。 为了确保可复制性,我们的工作可以免费作为开放源代码软件提供。