Inductive programming frequently relies on some form of search in order to identify candidate solutions. However, the size of the search space limits the use of inductive programming to the production of relatively small programs. If we could somehow correctly predict the subset of instructions required for a given problem then inductive programming would be more tractable. We will show that this can be achieved in a high percentage of cases. This paper presents a novel model of programming language instruction co-occurrence that was built to support search space partitioning in the Zoea distributed inductive programming system. This consists of a collection of intersecting instruction subsets derived from a large sample of open source code. Using the approach different parts of the search space can be explored in parallel. The number of subsets required does not grow linearly with the quantity of code used to produce them and a manageable number of subsets is sufficient to cover a high percentage of unseen code. This approach also significantly reduces the overall size of the search space - often by many orders of magnitude.
翻译:感应编程通常依赖于某种形式的搜索,以便找出候选解决方案。 但是,搜索空间的大小限制了对输入编程的使用,使其限制对相对较小的程序制作的感应编程。 如果我们能够以某种方式正确预测特定问题所需的说明的子集, 感应编程就会更加容易进行。 我们将表明,在高百分比的情况下可以做到这一点。 本文展示了一种新的编程语言教学共同使用模式, 用于支持佐亚分布式感应编程系统中的搜索空间分割。 其中包括一系列来自大量开源代码样本的交叉编程指令子集。 使用搜索空间的不同部分可以同时进行探讨。 所需的子集数量不会随着生成它们所使用的编码数量和可调控的子集数量而线性增长,足以覆盖高比例的隐形代码。 这种方法还大大降低了搜索空间的总体规模, 通常会以许多数量减少搜索空间的总体规模。