Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large savings in terms of computational resource use. To aid performance analysis, developers may collect an execution trace - a chronological log of program activity during execution. As traces represent the full history, developers can discover a wide array of possibly previously unknown performance issues, making them an important artifact for exploratory performance analysis. However, interactive trace visualization is difficult due to issues of data size and complexity of meaning. Traces represent nanosecond-level events across many parallel processes, meaning the collected data is often large and difficult to explore. The rise of asynchronous task parallel programming paradigms complicates the relation between events and their probable cause. To address these challenges, we conduct a continuing design study in collaboration with high performance computing researchers. We develop diverse and hierarchical ways to navigate and represent execution trace data in support of their trace analysis tasks. Through an iterative design process, we developed Traveler, an integrated visualization platform for task parallel traces. Traveler provides multiple linked interfaces to help navigate trace data from multiple contexts. We evaluate the utility of Traveler through feedback from users and a case study, finding that integrating multiple modes of navigation in our design supported performance analysis tasks and led to the discovery of previously unknown behavior in a distributed array library.
翻译:了解软件执行过程中的行为是确定和确定绩效问题的关键一步。 在高性能计算环境中,即使微小的性能突变也可以转化为计算资源使用方面的大量节约,这一点尤其重要。为了帮助进行绩效分析,开发商可以收集执行跟踪-执行过程中方案活动的按时间顺序记录。由于跟踪代表了全部历史,开发商可以发现一系列可能以前未知的绩效问题,从而将它们变成探索性绩效分析的重要文物。然而,由于数据大小和含义的复杂性问题,互动跟踪可视化是困难的。线索代表着许多平行过程的纳米二级事件,这意味着所收集的数据往往非常大,难以探索。不同步任务平行的编程模式的兴起使得事件及其可能的原因之间的关系复杂化。为了应对这些挑战,我们与高级性能计算研究人员合作进行一项持续的设计研究。我们开发了多种层次的导航和代表执行跟踪数据以支持其跟踪分析任务的方法。我们开发了一个迭代设计过程,我们开发了一个用于任务平行跟踪的综合性可视化平台。旅行者提供了多个链接的界面,帮助从多个背景中追踪数据,从多重设计中找到我们之前的探索方法。我们通过评估工具的用户。