We present robust high-performance implementations of signal-processing tasks performed by a high-throughput wildlife tracking system called ATLAS. The system tracks radio transmitters attached to wild animals by estimating the time of arrival of radio packets to multiple receivers (base stations). Time-of-arrival estimation of wideband radio signals is computationally expensive, especially in acquisition mode (when the time of transmission is not known, not even approximately). These computations are a bottleneck that limits the throughput of the system. We developed a sequential high-performance CPU implementation of the computations a few years back, and more recencely a GPU implementation. Both strive to balance performance with simplicity, maintainability, and development effort, as most real-world codes do. The paper reports on the two implementations and carefully evaluates their performance. The evaluations indicates that the GPU implementation dramatically improves performance and power-performance relative to the sequential CPU implementation running on a desktop CPU typical of the computers in current base stations. Performance improves by more than 50X on a high-end GPU and more than 4X with a GPU platform that consumes almost 5 times less power than the CPU platform. Performance-per-Watt ratios also improve (by more than 16X), and so do the price-performance ratios.
翻译:该系统通过估计无线电包运抵多个接收器(基地站)的时间,跟踪附属于野生动物的无线电发报机。宽带无线电信号的抵达时间估计计算成本很高,特别是在获取模式(当传输时间不详,甚至不太接近)方面。这些计算是一个瓶颈,限制了系统的传输量。我们开发了一个连续高性能的CPU,在几年前执行计算,并更准确地执行GPU。两种系统都努力将性能与大多数现实世界代码一样,平衡兼顾无线电包运抵多个接收器(基地站)的时间;宽带无线电信号的抵达时间估计是计算成本高昂的,特别是在获取模式(当传输时间不详,甚至不太接近)时。这些计算是一个瓶颈,它限制了系统的传输量。我们开发了一个连续性能高的CPU,在高端的GPU和4x以上的GPU平台上,其性能水平几乎比CPU要低5倍,其性能比CPU的比例要低16倍。