Fair algorithm evaluation is conditioned on the existence of high-quality benchmark datasets that are non-redundant and are representative of typical optimization scenarios. In this paper, we evaluate three heuristics for selecting diverse problem instances which should be involved in the comparison of optimization algorithms in order to ensure robust statistical algorithm performance analysis. The first approach employs clustering to identify similar groups of problem instances and subsequent sampling from each cluster to construct new benchmarks, while the other two approaches use graph algorithms for identifying dominating and maximal independent sets of nodes. We demonstrate the applicability of the proposed heuristics by performing a statistical performance analysis of five portfolios consisting of three optimization algorithms on five of the most commonly used optimization benchmarks. The results indicate that the statistical analyses of the algorithms' performance, conducted on each benchmark separately, produce conflicting outcomes, which can be used to give a false indication of the superiority of one algorithm over another. On the other hand, when the analysis is conducted on the problem instances selected with the proposed heuristics, which uniformly cover the problem landscape, the statistical outcomes are robust and consistent.
翻译:公平算法评估的条件是存在高质量的基准数据集,这些数据库是非冗余的,并代表典型的优化假设。在本文件中,我们评估了三种方法,以选择应参与优化算法比较的不同问题实例,从而确保对统计算法进行有力的业绩分析。第一种方法采用集群方法,确定类似的问题案例群,并随后对每个组群进行抽样,以制定新的基准,而另外两种方法则使用图表算法,确定主导和最大的独立节点组合。我们通过对五种组合进行统计性分析,对五种组合进行应用性分析,其中包括对五种最常用的优化基准进行三种优化算法。结果显示,对每种基准分别进行的算法业绩统计分析产生相互矛盾的结果,这些结果可用来错误地表明一种算法优于另一种。另一方面,在对与拟议的超自然学(统一覆盖问题面)选定的问题案例进行分析时,统计结果是稳健和一致的。