Intel includes on its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF-8 and UTF-16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF-8 to UTF-16 at more than 5 GiB/s using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open source library.
翻译:在其最近的处理器中, Intel 在其最近的处理器中包含一套强大的指令,能够处理512比特的单项指令(AVX-512),处理512比特的登记册。这些指令中有些在以前的指令中并不等同。我们利用这些指令来有效地将最常用格式(UTF-8和UTF-16)之间的字符串转换。我们的新算法常常比以前最好的解算法快一倍。例如,我们将中文文本从UTF-8转换为UTF-16,每字符使用不到2个 CPU 指令。为了确保可复制性,我们免费提供软件作为开放源库。