As the capacity of Solid-State Drives (SSDs) is constantly being optimised and boosted with gradually reduced cost, the SSD cluster is now widely deployed as part of the hybrid storage system in various scenarios such as cloud computing and big data processing. However, despite its rapid developments, the performance of the SSD cluster remains largely under-investigated, leaving its sub-optimal applications in reality. To address this issue, in this paper we conduct extensive empirical studies for a comprehensive understanding of the SSD cluster in diverse settings. To this end, we configure a real SSD cluster and gather the generated trace data based on some often-used benchmarks, then adopt analytical methods to analyse the performance of the SSD cluster with different configurations. In particular, regression models are built to provide better performance predictability under broader configurations, and the correlations between influential factors and performance metrics with respect to different numbers of nodes are investigated, which reveal the high scalability of the SSD cluster. Additionally, the cluster's network bandwidth is inspected to explain the performance bottleneck. Finally, the knowledge gained is summarised to benefit the SSD cluster deployment in practice.
翻译:随着固态硬盘(Solid-State Drives)容量不断进行优化和提升,成本逐渐降低,SSD集群现在作为混合存储系统的一部分广泛部署在各种场景中,例如云计算和大数据处理。然而,尽管它的发展迅速,SSD集群的性能仍然很少受到研究,使其在现实中的应用不够优化。为了解决这个问题,在本文中,我们进行了广泛的实证研究,以全面了解SSD集群在不同环境下的表现。为此,我们配置了一个真实的SSD集群,并根据某些常用基准测试获取生成的跟踪数据,然后采用分析方法来分析不同配置下SSD集群的性能。特别地,我们建立回归模型以在更广泛的配置下提供更好的性能预测性,并调查不同节点数下的有影响因素与性能指标之间的相关性,这揭示了SSD集群的高可伸缩性。此外,检查集群网络带宽以解释性能瓶颈。最后,总结得到的知识以有益于实际应用中的SSD集群部署。