SSD 集群在分布式文件系统中的性能表现：一个实证研究 (How does SSD Cluster Perform for Distributed File Systems: An Empirical Study)

As the capacity of Solid-State Drives (SSDs) is constantly being optimised and boosted with gradually reduced cost, the SSD cluster is now widely deployed as part of the hybrid storage system in various scenarios such as cloud computing and big data processing. However, despite its rapid developments, the performance of the SSD cluster remains largely under-investigated, leaving its sub-optimal applications in reality. To address this issue, in this paper we conduct extensive empirical studies for a comprehensive understanding of the SSD cluster in diverse settings. To this end, we configure a real SSD cluster and gather the generated trace data based on some often-used benchmarks, then adopt analytical methods to analyse the performance of the SSD cluster with different configurations. In particular, regression models are built to provide better performance predictability under broader configurations, and the correlations between influential factors and performance metrics with respect to different numbers of nodes are investigated, which reveal the high scalability of the SSD cluster. Additionally, the cluster's network bandwidth is inspected to explain the performance bottleneck. Finally, the knowledge gained is summarised to benefit the SSD cluster deployment in practice.

翻译：随着固态硬盘（Solid-State Drives，SSDs）的容量不断被优化和提高，成本也在逐渐下降，SSD 集群作为混合存储系统的一部分在云计算和大数据处理等各种场景中被广泛部署。然而，尽管发展迅速，SSD 集群的性能仍然被广泛忽视，在现实中应用并不十分有效。为了解决这个问题，本文在不同的设置下进行了广泛的实证研究，以全面了解 SSD 集群的性能。为此，我们配置了一个真正的 SSD 集群，并基于一些常用基准测试收集生成的跟踪数据，然后采用分析方法分析 SSD 集群在不同配置下的性能。特别是，构建回归模型以在广泛配置下提供更好的性能可预测性，并调查影响因素和性能指标之间的相关性，以及不同节点数的情况下的相关性，揭示了 SSD 集群的高可伸缩性。此外，检查集群的网络带宽以解释性能瓶颈。最后，总结所获得的知识以使 SSD 集群在实践中受益。