We propose a new approach to automated theorem proving and deductive program synthesis where an AlphaZero-style agent is self-training to refine a high-level expert strategy expressed as a nondeterministic program. An analogous teacher agent is self-training to generate tasks of suitable relevance and difficulty for the learner. This allows leveraging minimal amounts of domain knowledge to tackle problems for which training data is unavailable or hard to synthesize. We illustrate our approach on the problem of loop invariant synthesis for imperative programs and using neural networks to refine both the teacher and solver strategies.
翻译:我们建议采用新的方法,使理论验证和推论程序合成自动化,即阿尔法零式代理机构进行自我培训,以完善以非决定性方案形式表述的高级专家战略。类似的教师代理机构进行自我培训,以产生对学习者来说具有适当相关性和困难的任务。这样就可以利用最低限度的域知识解决那些没有培训数据或难以综合的问题。我们说明了我们处理循环变异合成问题的方法,用于紧急方案,并利用神经网络改进教师和求解者的战略。