利用基于抽象精炼的优化优化,对噪音数据集进行感知性程序合成 (Inductive Program Synthesis over Noisy Datasets using Abstraction Refinement Based Optimization)

We present a new synthesis algorithm to solve program synthesis over noisy datasets, i.e., data that may contain incorrect/corrupted input-output examples. Our algorithm uses an abstraction refinement based optimization process to synthesize programs which optimize the tradeoff between the loss over the noisy dataset and the complexity of the synthesized program. The algorithm uses abstractions to divide the search space of programs into subspaces by computing an abstract value that represents outputs for all programs in a subspace. The abstract value allows our algorithm to compute, for each subspace, a sound approximate lower bound of the loss over all programs in the subspace. It iteratively refines these abstractions to further subdivide the space into smaller subspaces, prune subspaces that do not contain an optimal program, and eventually synthesize an optimal program. We implemented this algorithm in a tool called Rose. We compare Rose to a current state-of-the-art noisy program synthesis system using the SyGuS 2018 benchmark suite. Our evaluation demonstrates that Rose significantly outperforms this previous system: on two noisy benchmark program synthesis problems sets drawn from the SyGus 2018 benchmark suite, Rose delivers speedups of up to 1587 and 81.7, with median speedups of 20.5 and 81.7. Rose also terminates on 20 (out of 54) and 4 (out of 11) more benchmark problems than the previous system. Both Rose and the previous system synthesize programs that are optimal over the provided noisy data sets. For the majority of the problems in the benchmark sets ($272$ out of $286$), the synthesized programs also produce correct outputs for all inputs in the original (unseen) noise-free data set. These results highlight the benefits that Rose can deliver for effective noisy program synthesis.

翻译：我们提出了一个新的合成算法, 以解决在噪音数据集上的程序合成, 即可能包含不正确/ 中断输入输出示例的数据。我们的算法使用一个基于抽象的精细优化优化程序, 以综合程序, 优化噪音数据集损失与合成程序复杂性之间的权衡。算法使用抽象的算法, 将程序搜索空间分成子空间, 计算出在子空间中代表所有程序输出的抽象值。抽象值使我们的算法能够对每个子空间中所有程序的损失进行精确的较低约束。它反复完善这些抽象性, 将空间进一步细分为较小的子空间, Prume 子空间不包含最佳程序, 并最终合成一个最佳程序。我们在名为Rose的工具中应用了这个算法将程序搜索空间的搜索空间分为一个小空间。我们用SyGue 2018 基准套计算出一个当前最先进的程序合成系统。我们的评估显示, Rose 明显超越了这个系统: 从 SyGue 205 中绘制的两个更精确的基准化的合成程序集集集, 提供 205 20 的精确的精确的精确的系统和 2087 标准的精化系统。