概率数据库中的斯托卡软件包查询 (Stochastic Package Queries in Probabilistic Databases)

We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a package (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall cost function; in most real-world problems, the data is uncertain. We provide methods for specifying -- via a SQL extension -- and processing stochastic package queries (SPQs), in order to solve optimization problems over uncertain data, right where the data resides. Prior work in stochastic programming uses Monte Carlo methods where the original stochastic optimization problem is approximated by a large deterministic optimization problem that incorporates many scenarios, i.e., sample realizations of the uncertain data values. For large database tables, however, a huge number of scenarios is required, leading to poor performance and, often, failure of the solver software. We therefore provide a novel SummarySearch algorithm that, instead of trying to solve a large deterministic problem, seamlessly approximates it via a sequence of smaller problems defined over carefully crafted summaries of the scenarios that accelerate convergence to a feasible and near-optimal solution. Experimental results on our prototype system show that SummarySearch can be orders of magnitude faster than prior methods at finding feasible and high-quality packages.

翻译：我们为在不确定情况下决策提供数据库内支持的方法。许多重要的决策问题与选择一个包件(关系数据库中的一袋小便)相对应,该包件可以共同满足一系列限制,同时最大限度地降低整体成本功能;在大多数现实世界的问题中,数据是不确定的。我们通过SQL扩展提供具体的方法,并处理随机软件查询(SPQs),以解决不确定数据方面的优化问题,而数据所在位置是数据所在的。以前在随机程序设计方法中使用的方法,最初的随机优化问题被包含许多设想的大规模确定性优化问题所近似,即对不确定数据值的抽样实现。但是,对于大型数据库表格来说,需要大量设想,导致性能差,而且往往导致求解软件的故障。因此,我们提供了一种新的“GistrictSearch”算法,它不是试图解决一个大的确定性能问题,而是通过一系列小问题来完美地接近它,这些问题是精心界定的、对各种设想的情景进行精细的总结,从而加速接近于可行和近于最理想的数据值的实现。对于前系统来说,实验性能更快地显示我们原型系统的结果。