Configuration tuning for large software systems is generally challenging due to the complex configuration space and expensive performance evaluation. Most existing approaches follow a two-phase process, first learning a regression-based performance prediction model on available samples and then searching for the configurations with satisfactory performance using the learned model. Such regression-based models often suffer from the scarcity of samples due to the enormous time and resources required to run a large software system with a specific configuration. Moreover, previous studies have shown that even a highly accurate regression-based model may fail to discern the relative merit between two configurations, whereas performance comparison is actually one fundamental strategy for configuration tuning. To address these issues, this paper proposes CM-CASL, a Comparison-based performance Modeling approach for software systems via Collaborative Active and Semisupervised Learning. CM-CASL learns a classification model that compares the performance of two given configurations, and enhances the samples through a collaborative labeling process by both human experts and classifiers using an integration of active and semisupervised learning. Experimental results demonstrate that CM-CASL outperforms two state-of-the-art performance modeling approaches in terms of both classification accuracy and rank accuracy, and thus provides a better performance model for the subsequent work of configuration tuning.
翻译:大规模软件系统的配置调整通常具有复杂的配置空间和昂贵的性能评估,现有方法往往分两个阶段,即首先在可用样本上学习基于回归的性能预测模型,然后使用学习到的模型搜索具有满意性能的配置。这样的回归模型常常因为大量时间和资源需要运行具有特定配置的大型软件系统而样本不足而受限。此外,先前的研究表明,即使回归模型非常准确,也可能无法区分两个配置之间的相对优劣,而性能比较实际上是一种配置调整的基本策略。为了解决这些问题,本文提出了一种基于协同主动学习和半监督学习的软件系统性能对比建模方法CM-CASL。CM-CASL学习一个分类模型,比较给定配置的性能,并通过人类专家和分类器的协作标注过程以及积极和半监督学习的集成来增强样本。实验结果表明,CM-CASL在分类准确性和排名准确性方面优于两种最先进的性能建模方法,为后续的配置调整工作提供了更好的性能模型。