In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.
翻译:在目前“大数据”背景下,已经提出并实施了大量新的“NOSQL”解决方案,用于储存、管理和从半结构化数据中提取信息和模式。这些解决方案是为了缓解关系数据库中存在的僵硬数据结构问题,采用了半结构化和灵活的系统设计。目前,由于不同来源和装置,特别是IoT传感器和导演器生成的数据,使用XML或JSON格式,这取决于应用程序、储存和查询半结构化数据的数据库技术,需要以XML格式进行存储和查询。因此,最初设计用来使用标准化查询语言(即,XQuery和XPath)处理XMM数据的本土X数据库数据库被重新标定为“NoSQL文档垂直化数据库系统”。目前,这些解决方案的大部分已被更现代的基于JSON的基数据库管理系统所取代。然而,基于XMLMS的解决方案仍然能提供在混合收集的复杂查询的性能。不幸的是,如今,我们的研究也缺乏对数据库技术的可缩略性和性加以比较,而我们在数据库中的存储和查询数据库中的大数据库文件与更现代化的基级数据库中,这种数据库,在数据库中,在数据库中,在数据库中,对数据库中,我们选择的文档使用最佳的CXLODMLUDMLMLM格式,对数据库中,对数据库中的最佳格式是使用。