The number one criticism of average-case analysis is that we do not actually know the probability distribution of real-world inputs. Thus, analyzing an algorithm on some random model has no implications for practical performance. At its core, this criticism doubts the existence of external validity, i.e., it assumes that algorithmic behavior on the somewhat simple and clean models does not translate beyond the models to practical performance real-world input. With this paper, we provide a first step towards studying the question of external validity systematically. To this end, we evaluate the performance of six graph algorithms on a collection of 2751 sparse real-world networks depending on two properties; the heterogeneity (variance in the degree distribution) and locality (tendency of edges to connect vertices that are already close). We compare this with the performance on generated networks with varying locality and heterogeneity. We find that the performance in the idealized setting of network models translates surprisingly well to real-world networks. Moreover, heterogeneity and locality appear to be the core properties impacting the performance of many graph algorithms.
翻译:平均分析中最重要的批评是,我们实际上并不知道真实世界投入的概率分布。 因此, 分析某种随机模型的算法对实际表现没有影响。 就其核心而言, 这一批评质疑外部有效性的存在, 即, 假设简单而干净模型的算法行为不会超越模型而转化为实际表现真实世界投入。 有了本文件, 我们为系统研究外部有效性问题迈出了第一步。 为此, 我们评估了根据两个属性收集的2751个稀疏真实世界网络的六种图表算法的性能; 异质性( 程度分布的变异性) 和地点( 连接已经接近的脊椎的边缘性) 。 我们把它与生成的网络的性能进行比较, 其位置和异质性。 我们发现, 理想化网络模型设置的性能与真实世界网络网络的性能是惊人的。 此外, 异性和地点性似乎是影响许多图表算法性的核心特性。