This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch's scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early -- further allowing it to scale to challenging datasets by means of early stopping.
翻译:本文介绍由物理引导的自上而下合成功能,作为综合图书馆功能的一种机制,这些功能从一个特定域语言(DSL)的一组程序中获得共同功能。算法直接从最初的 DSL 原始中直接建立抽象,使用中间抽象的合成模式匹配来智能地利用搜索空间,并将算法引向能够最大限度地捕捉到该物理中共享结构的抽象。我们用一个名为Stitch 的工具来实施这一方法,并对照最先进的从DreamCoder 中学习的分离图书馆算法来对其进行评估。我们的评估显示,Stitch 3-4 级的数值更快,使用两个数量级的缩略等2级的记忆,同时保持可比或更好的图书馆质量(以压缩为测量 ) 。 我们还在公司上展示了Stitch 的可缩放性, 包含数百个复杂的程序, 与先前的推算方法不易操作, 并用经验显示它很强大, 能够尽早终止搜索程序, 进一步使其通过早期停用手段对数据集提出挑战 。