Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data. Inference query explanation seeks to explain unexpected aggregate query results on inference data; such queries are challenging to explain because an explanation may need to be derived from the source, training, or inference data in an ML pipeline. In this paper, we model an objective function as a black-box function and propose BOExplain, a novel framework for explaining inference queries using Bayesian optimization (BO). An explanation is a predicate defining the input tuples that should be removed so that the query result of interest is significantly affected. BO - a technique for finding the global optimum of a black-box function - is used to find the best predicate. We develop two new techniques (individual contribution encoding and warm start) to handle categorical variables. We perform experiments showing that the predicates found by BOExplain have a higher degree of explanation compared to those found by the state-of-the-art query explanation engines. We also show that BOExplain is effective at deriving explanations for inference queries from source and training data on a variety of real-world datasets. BOExplain is open-sourced as a Python package at https://github.com/sfu-db/BOExplain.
翻译:获取 SQL 查询结果的解释可以丰富分析经验,揭示数据错误,并更深入地了解数据。 推论查询解释试图解释关于推断数据的意外总体查询结果; 此类查询具有挑战性,因为解释可能需要从ML管道中的源、培训或推断数据中得出。 在本文中,我们将客观功能作为黑箱功能进行模型,并提议BOExplain,这是使用Bayesian优化(BO)解释推断查询的新框架。 解释是一个前提,它界定了应删除的输入图例,以便显著影响查询结果。 BO -- -- 找到黑盒功能的全球最佳技术 -- -- 用来找到最佳的上游数据。 我们开发了两种新的技术(个人贡献编码和温暖的起点)来处理绝对变量。 我们进行实验,表明BOExtratrain的前提比使用州查询引擎(BOBO) 解释引擎发现的解释要高得多。 我们还表明,BOExplain对源/BO的公开数据源/ Astrain 数据库的公开数据包有效解释。