To be able to run tasks asynchronously on NVIDIA GPUs a programmer must explicitly implement asynchronous execution in their code using the syntax of CUDA streams. Streams allow a programmer to launch independent concurrent execution tasks, providing the ability to utilise different functional units on the GPU asynchronously. For example, it is possible to transfer the results from a previous computation performed on input data n-1, over the PCIe bus whilst computing the result for input data n, by placing different tasks in different CUDA streams. The benefit of such an approach is that the time taken for the data transfer between the host and device can be hidden with computation. This case study deals with the implementation of CUDA streams into AstroAccelerate. AstroAccelerate is a GPU accelerated real-time signal processing pipeline for time-domain radio astronomy.
翻译:程序员必须使用 CUDA 流的语法在其代码中明确执行非同步执行任务,才能对 NVIDIA GPUs 运行任务。 串流允许程序员启动独立同时执行的任务, 使程序员能够使用 GPU 上的不同功能单位 。 例如, 在计算输入数据n-1 的结果的同时, 将先前对输入数据 n-1 进行计算的结果转移到 PCIe 公共汽车上, 并在 CUDA 流中设置不同的任务 。 这种方法的好处是, 主机和装置之间数据传输的时间可以隐藏在计算中。 本案例研究涉及将 CUDA 流应用于 AstroAceratera。 AstroAccerate 是用于时空射天文学的 GPUP 快速实时信号处理管道 。