Data races are egregious parallel programming bugs on CPUs. They are even worse on GPUs due to the hierarchical thread and memory structure, which makes it possible to write code that is correctly synchronized within a thread group while not being correct across groups. Thus far, all major data-race checkers for GPUs suffer from at least one of the following problems: they do not check races in global memory, do not work on recent GPUs, scale poorly, have not been extensively tested, miss simple data races, or are not dependable without detailed knowledge of the compiler. Our new data-race detection tool, HiRace, overcomes these limitations. Its key novelty is an innovative parallel finite-state machine that condenses an arbitrarily long access history into a constant-length state, thus allowing it to handle large and long-running programs. HiRace is a dynamic tool that checks for thread-group shared memory and global device memory races. It utilizes source-code instrumentation, thus avoiding driver, compiler, and hardware dependencies. We evaluate it on a modern calibrated data-race benchmark suite. On the 580 tested CUDA kernels, 346 of which contain data races, HiRace finds races missed by other tools without false alarms and is more than 10 times faster on average than the current state of the art, while incurring only half the memory overhead.
翻译:暂无翻译