We present a comprehensive set of conditions and rules to control the correctness of aggregation queries within an interactive data analysis session. The goal is to extend self-service data preparation and BI tools to automatically detect semantically incorrect aggregate queries on analytic tables and views built by using the common analytic operations including filter, project, join, aggregate, union, difference, and pivot. We introduce aggregable properties to describe for any attribute of an analytic table which aggregation functions correctly aggregates the attribute along which sets of dimension attributes. These properties can also be used to formally identify attributes which are summarizable with respect to some aggregation function along a given set of dimension attributes. This is particularly helpful to detect incorrect aggregations of measures obtained through the use of non-distributive aggregation functions like average and count. We extend the notion of summarizability by introducing a new generalized summarizability condition to control the aggregation of attributes after any analytic operation. Finally, we define propagation rules which transform aggregable properties of the query input tables into new aggregable properties for the result tables, preserving summarizability and generalized summarizability.
翻译:我们提出了一套全面的条件和规则,以在互动式数据分析会中控制汇总查询的正确性。目标是扩大自我服务数据编制和BI工具,以自动检测使用共同分析操作,包括过滤器、项目、合并、汇总、合并、合并、差异和分流等共同分析操作所建立的分析表和视图中具有的词义不正确的汇总性查询。我们引入了可分类性来描述分析表的任何属性,该分析表将功能正确地汇总成一组维属性的属性。这些属性还可用于正式确定在一组特定维属性中某些汇总功能中可加以汇总的属性。这特别有助于发现通过使用非分配汇总功能,如平均和计数,获得的措施的不正确汇总。我们扩展了可比较性概念,采用了新的通用总和性条件,以控制在任何分析性操作后对属性的汇总。最后,我们界定传播规则,将查询输入表的可汇总性能转化为结果表格的新可分类属性,维护可汇总性和可统性。