Scientific experiments, simulations, and modern applications generate large amounts of data. Data is stored in raw format to avoid the high loading time of traditional database management systems. Researchers have proposed many techniques to improve query execution time for raw data and reduce data loading time for traditional systems. The core of all the proposed techniques is efficient utilization of resources by processing only required data or reducing operations on data. The processed data caching in the main memory or disk can resolve this issue and avoid repeated processing of data. However, limitations of resources like main memory space, storage IO speeds, and additional storage space requirements on disk need to be considered to provide reliable and scalable solutions for cloud or in-house deployments. This paper presents improvements to the raw data query processing framework by integrating a resource monitoring module. The experiments were performed using a scientific dataset known Sloan Digital Sky Survey (SDSS). Analysis of monitored resources revealed that sampling queries had the lowest resource utilization. The PostgresRAW can answer simple 0-JOIN queries faster than PostgreSQL. While one or more JOIN complex queries need to be answered using PostgreSQL to reduce workload execution time (WET). The results section discusses resource requirements of simple, complex, and sampling type queries. The result analysis of query types and resource utilization patterns assisted in proposing Query Complexity Aware (QCA) and Resource Utilization Aware (RUA) data partitioning techniques for raw engines and DBMS to reduce cost or data to result time.
翻译:数据以原始格式储存,以避免传统数据库管理系统大量投入的时间; 研究人员提出了许多技术,以改进原始数据的查询执行时间,并减少传统系统的数据输入时间; 所有拟议技术的核心都是通过只处理所需数据或减少数据操作来有效利用资源; 在主记忆或磁盘中处理的积压数据可以解决这个问题并避免重复处理数据; 然而,需要考虑主要记忆空间、存储 IO 速度等资源的局限性,以及磁盘上更多存储空间要求,以便为云层或内部部署提供可靠和可缩放的解决方案; 研究人员提出了许多技术,以改进原始数据查询处理框架,办法是整合一个资源监测模块; 所有拟议技术的核心是高效利用资源,仅处理所需数据,或减少数据操作操作; 对监测资源的分析表明,取样查询的资源利用率最低; 邮政系统可以比SpostgreSQL速度更快地回答简单的0-JOIN查询; 需要用PostSQL提供可靠和可缩缩放的解决方案, 并用原始数据查询方式(WET) 分析资源类型, 分析结果类型; 分析: 简化的SplainSQQQQRRERL 和REDRRL 和REL 分析。