With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-precision arithmetic. However, we need at most 17 digits to represent 64-bit standard floating-point numbers (IEEE 754). Thus we can represent the decimal significand with a single 64-bit word. By combining the significand and precomputed tables, we can compute the nearest floating-point number using as few as one or two 64-bit multiplications. Our implementation can be several times faster than conventional functions present in standard C libraries on modern 64-bit systems (Intel, AMD, ARM and POWER9). Our work is available as open source software used by major systems such as Apache Arrow and Yandex ClickHouse. The Go standard library has adopted a version of our approach.
翻译:磁盘和网络每秒提供千兆字节, 从字符串中解析小数数字成为瓶颈。 我们考虑将小数数解为最近的二进制浮点值的问题。 一般问题需要可变精度算术。 但是,我们需要最多17位数字来代表64位标准浮点数( IEEE 754)。 因此, 我们可以用一个64位字来代表小数符号和单64位字。 通过合并符号和预编表格, 我们可以用一个或两个64位乘数来计算最接近的浮点数。 我们的执行速度可以比标准 C 库中现代64位系统( Intel、 AMD、 ARM 和 POWER9) 的常规功能快几倍。 我们的工作可以作为主要系统, 如 Apache Arower 和 Yandex 点击House 所使用的开放源软件。 Go 标准库采用了我们的方法的版本 。