In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums corresponding to bit values at indexes 0, 1, 2, ..., k-1. If the k-bit words are one-hot encoded then the sums correspond to a frequency histogram. This multiple-sum problem is a generalization of the population-count problem where we seek the sum of all bit values. Accordingly, we refer to the multiple-sum problem as a positional population-count. Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16-bit position population count using less than half of a CPU cycle per 16-bit word. Our best approach uses up to 400 times fewer instructions and is up to 50 times faster than baseline code using only regular (non-SIMD) instructions, for sufficiently large inputs.
翻译:在统计、机器学习和生物信息学等多个领域,绝对变量通常以单热编码矢量表示。例如,根据8个不同的值,我们将每个值都映射到仅设定了一位数的字节。我们有动力快速计算这些编码的统计。在 k-bit 单词的流中,我们试图计算与指数0、1、2、...、 k-1 的比特值相对应的 k 单数。如果 k-bit 单词是一热编码,然后数数对应频率直方图。这个多和问题是一个人口计数问题的一般化问题,我们在这里寻找所有位数的总和。因此,我们把多和问题称为位置人口计。使用最近Intel 处理器的 SIMD (单调、多数据) 指令,我们描述计算16位人口数的算法,使用每16位单词的CPU周期不到一半。我们的最佳方法使用400倍的指令,并且比基线码快50倍,仅使用正常输入(非SIMD) 。