Hybrid complex analytics workloads typically include (i) data management tasks (joins, selections, etc. ), easily expressed using relational algebra (RA)-based languages, and (ii) complex analytics tasks (regressions, matrix decompositions, etc.), mostly expressed in linear algebra (LA) expressions. Such workloads are common in many application areas, including scientific computing, web analytics, and business recommendation. Existing solutions for evaluating hybrid analytical tasks - ranging from LA-oriented systems, to relational systems (extended to handle LA operations), to hybrid systems - either optimize data management and complex tasks separately, exploit RA properties only while leaving LA-specific optimization opportunities unexploited, or focus heavily on physical optimization, leaving semantic query optimization opportunities unexplored. Additionally, they are not able to exploit precomputed (materialized) results to avoid recomputing (part of) a given mixed (RA and/or LA) computation. In this paper, we take a major step towards filling this gap by proposing HADAD, an extensible lightweight approach for optimizing hybrid complex analytics queries, based on a common abstraction that facilitates unified reasoning: a relational model endowed with integrity constraints. Our solution can be naturally and portably applied on top of pure LA and hybrid RA-LA platforms without modifying their internals. An extensive empirical evaluation shows that HADAD yields significant performance gains on diverse workloads, ranging from LA-centered to hybrid.
翻译:复杂的混合分析工作量通常包括:(一) 数据管理任务(工具、选择等),使用以关系代数(RA)为基础的语言很容易表达,使用关系代数(RA)语言容易表达,以及(二) 复杂的分析任务(回归、矩阵分解等),主要表现为线性代数(LA)表达方式。这些工作量在许多应用领域,包括科学计算、网络分析和商业建议,都是常见的。评估混合分析任务的现有解决办法,从面向LA的系统到关系系统(可扩展至处理LA的业务),到混合系统----要么优化数据管理和复杂任务,仅利用RA的特性,而使特定LA的优化机会没有被利用,或高度侧重于物理优化,使语义性查询优化机会没有被利用。此外,这些工作量无法利用预先计算(物质化)的结果来避免重新计算(部分)特定混合(RA和/或LA)计算。在本文中,我们为填补这一差距迈出了一大一步,即提出ADADAD, 一种可扩展的轻度的国内成本评估方法,用于优化复杂的混合结构结构,以优化我们共同的IM路级结构的深度研究。