In the domain of high-energy physics (HEP), query languages in general and SQL in particular have found limited acceptance. This is surprising since HEP data analysis matches the SQL model well: the data is fully structured and queried using mostly standard operators. To gain insights on why this is the case, we perform a comprehensive analysis of six diverse, general-purpose data processing platforms using an HEP benchmark. The result of the evaluation is an interesting and rather complex picture of existing solutions: Their query languages vary greatly in how natural and concise HEP query patterns can be expressed. Furthermore, most of them are also between one and two orders of magnitude slower than the domain-specific system used by particle physicists today. These observations suggest that, while database systems and their query languages are in principle viable tools for HEP, significant work remains to make them relevant to HEP researchers.
翻译:在高能物理(HEP)领域,一般的查询语言,特别是SQL的查询语言得到的接受程度有限,这是令人惊讶的,因为HEP的数据分析与SQL模型完全吻合:数据结构完整,而且大部分使用标准操作者对数据进行查询。为了深入了解原因,我们利用HEP基准对六个不同的通用数据处理平台进行了全面分析。评价结果是对现有解决方案的有趣和相当复杂的描述:它们的查询语言在如何表达自然和简明的HEP查询模式方面差异很大。此外,其中多数也比粒子物理学家今天使用的特定领域系统慢一至两级。这些观察表明,虽然数据库系统及其查询语言原则上是HEP的可行工具,但仍需做大量工作,才能使其与HEP研究人员相关。