The advancement in HPC and BDA ecosystem demands a better understanding of the storage systems to plan effective solutions.To make applications access data more efficiently for computation, HPC and BDA ecosystems adopt different storage systems.Each storage system has its pros and cons.Therefore, it is worthwhile and interesting to explore the storage systems used in HPC and BDA respectively.Also, it's inquisitive to understand how such storage systems can handle data consistency and fault tolerance at a massive scale.In this paper, we're surveying four storage systems Lustre, Ceph, HDFS, and CockroachDB.Lustre and HDFS are some of the most prominent file systems in HPC and BDA ecosystem.Ceph is an upcoming filesystem and is being used by supercomputers.CockroachDB is based on NewSQL systems a technique that is being used in the industry for BDA applications.The study helps us to understand the underlying architecture of these storage systems and the building blocks used to create them.The protocols and mechanisms used for data storage, data access, data consistency, fault tolerance, and recovery from failover are also overviewed.The comparative study will help system designers to understand the key features and architectural goals of these storage systems to select better storage system solutions.
翻译:HPC和BDA生态系统的进步要求更好地了解储存系统,以规划有效的解决办法。为了使应用访问数据更有效地用于计算,HPC和BDA生态系统采用不同的储存系统。HPC和HDA生态系统采用不同的储存系统。每个储存系统都有其利弊。 因此,探索HPC和BDA分别使用的储存系统是值得的,也令人感兴趣。 另外,我们很想知道这些储存系统如何能大规模地处理数据一致性和差错容忍度。在本文件中,我们正在调查四个储存系统Lustre、Ceph、HDFS和CockroachDB.Lustre和HDFS,这是HPC和BDA生态系统中最突出的档案系统。Ceph是一个即将推出的档案系统,正在被超级计算机使用。CockachDBockroDB以新SQL系统为基础,一种技术正在用于BDA应用程序。研究有助于我们了解这些储存系统的基本结构以及用来创建这些系统的建筑构件。 数据储存、数据访问、数据一致性、数据容忍度以及建筑总图的修复方法,这些系统将更能帮助选择主要存储目标。