Finding the convex hull is a fundamental problem in computational geometry. Quickhull is a fast algorithm for finding convex hulls. In this paper, we present VQhull, a fast parallel implementation of Quickhull that exploits vector instructions, and coordinates CPU cores in a way that minimizes data movement. This implementation obtains a sequential runtime improvement of 1.6--16x, and a parallel runtime improvement of 1.5-11x compared to the state of the art on the Problem Based Benchmark Suite. VQhull achieves 85--100% of non-NUMA architectures' peak bandwidth, and 66--78% on our two-CPU NUMA system. This leaves little room for further improvements. A 4x speedup on 8 cores has a parallel efficiency of 50%. This suggests a waste of energy, but our measurements show a more complicated picture: energy usage may even be lower in parallel. Quickhull serves as a case study that runtime and energy consumption do not go hand in hand.
翻译:暂无翻译