Performance variability management is an active research area in high-performance computing (HPC). We focus on input/output (I/O) variability. To study the performance variability, computer scientists often use grid-based designs (GBDs) to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models could be biased particularly if extrapolations are needed. Space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability needs investigation. We investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model.
翻译:性能可变性管理是高性能计算(HPC)中一个积极的研究领域。我们注重投入/输出(I/O)变量。为了研究性能可变性,计算机科学家经常使用基于网格的设计(GBDs)来收集I/O变异性数据,并使用数学近似方法来建立预测模型。数学近似模型可能会有偏差,特别是如果需要外推法的话。空间填充设计(SFDs)和代孕模型(Gaussian progy (GP))对数据收集和建设预测模型都很受欢迎。SFDs和HPC变异性能调查中的 SFDs的适用性能和代理机器人。在设计效率、预测准确性和可缩缩略性方面,我们调查其在HPC设置中的应用性能。在设计效率、预测性能、预测准确性和可缩略性方面,我们首先定制现有的SFDDs,如果SFD是S格式中的最佳性能分析,那么SFDs的SFD是S格式,那么,在SFD中,在SFD中,在SFD中,在SFD中,在SFD中,在SFD是建议的数据格式中,在SFD中,在SFD是高性能分析中,在SFD中,在SFD中,在SFD中,在SFD中,在SFD中,则取决于数据中,在SFD中,在SFD中,在SFD中,在SFD中,在SFD中,在建议的数据是低数据分析中,在质量分析中,在数据中,在质量分析中,在S方面,在SD中,在S是低中,在SFD中,在数据中,在数据分析中,在数据分析中,在SFD中,在数据中,在数据中,是最的,在数据中,在数据分析中,在数据中,在数据分析中,在数据分析中,在数据分析中,在数据分析中,最中,在数据中,在数据中,在数据中,在数据分析中,在数据中,我们数据中,在数据中,在数据中,在数据中,在数据分析中,