The advancement in HPC and BDA ecosystem demands a better understanding of the storage systems to plan effective solutions. To make applications access data more efficiently for computation, HPC and BDA ecosystems adopt different storage systems. Each storage system has its pros and cons. Therefore, it is worthwhile and interesting to explore the storage systems used in HPC and BDA respectively. Also, it's inquisitive to understand how such storage systems can handle data consistency and fault tolerance at a massive scale. In this paper, we're surveying four storage systems Lustre, Ceph, HDFS, and CockroachDB. Lustre and HDFS are some of the most prominent file systems in HPC and BDA ecosystem. Ceph is an upcoming filesystem and is being used by supercomputers. CockroachDB is based on NewSQL systems a technique that is being used in the industry for BDA applications. The study helps us to understand the underlying architecture of these storage systems and the building blocks used to create them. The protocols and mechanisms used for data storage, data access, data consistency, fault tolerance, and recovery from failover are also overviewed. The comparative study will help system designers to understand the key features and architectural goals of these storage systems to select better storage system solutions.
翻译:HPC和BDA生态系统的进步要求更好地了解存储系统,以规划有效的解决方案。为了使应用程序访问数据更高效地用于计算,HPC和BDA生态系统采用不同的存储系统。每个存储系统都有其利弊。因此,探索HPC和BDA分别使用的存储系统是值得的,也是有趣的。此外,了解这种存储系统如何大规模处理数据一致性和差错容忍度也是令人好奇的。在本文中,我们正在调查四个存储系统Lustre、Ceph、HDFS和CockroachDB。Lustre和HDFS是HPC和BDA生态系统中最突出的文件系统的一部分。Ceph是一个即将推出的文件系统,正被超级计算机使用。CophDBDA应用程序基于新SQL系统,这是业界正在使用的大规模数据一致性和差错容忍度技术。研究有助于我们了解这些存储系统的基本结构以及用来创建这些系统的建筑块。用于数据存储、数据访问、数据一致性、过错容忍和回收的程式和机制将帮助数据存储系统的程序和机制选取关键存储系统的系统,从而选择关键的系统。