We present faster-than-native alternatives for the full AVX512-VP2INTERSECT instruction subset using basic AVX512F instructions. These alternatives compute only one of the output masks, which is sufficient for the typical case of computing the intersection of two sorted lists of integers, or computing the size of such an intersection. While the na\"ive implementation (compare the first input vector against all rotations of the second) is slower than the native instructions, we show that by rotating both the first and second operands at the same time there is a significant saving in the total number of vector rotations, resulting in the emulations being faster than the native instructions, for all instructions in the VP2INTERSECT subset. Additionally, the emulations can be easily extended to other types of inputs (e.g. packed vectors of 16-bit integers) for which native instructions are not available.
翻译:我们用基本的 AVX512F 指令为完整的 AVX512-VP2INTERSECT 教学分集提供了比原样更快的替代方法。 这些替代方法只计算出一个输出面, 这对于计算两个分类的整数列表的交叉点或计算这种交叉点的大小的典型情况就足够了。 虽然“ 导航” 执行( 对照第二组的所有旋转量对第一个输入矢量进行比较) 慢于原样指示, 但是通过同时旋转第一和第二个矢量旋转, 我们显示, 矢量旋转的总数会大大节省, 导致模拟比原样指令速度快, 对于VP2INTERSECT 组的所有指示。 此外, 模拟可以很容易地扩展到其他类型的输入( 例如, 16位整数的包装矢量), 没有本地指令 。