We propose a new approach to automated theorem proving where an AlphaZero-style agent is self-training to refine a generic high-level expert strategy expressed as a nondeterministic program. An analogous teacher agent is self-training to generate tasks of suitable relevance and difficulty for the learner. This allows leveraging minimal amounts of domain knowledge to tackle problems for which training data is unavailable or hard to synthesize. As a specific illustration, we consider loop invariant synthesis for imperative programs and use neural networks to refine both the teacher and solver strategies.
翻译:我们建议采用新的方法,在阿尔法零星式代理商进行自我培训,以完善一般高级专家战略,并将其作为非决定性的方案。类似的教师代理商进行自我培训,以便为学习者创造适当相关和困难的任务。这样就可以利用最低限度的域知识来解决培训数据缺乏或难以综合的问题。具体地说,我们考虑对迫切方案进行循环不变化的合成,并利用神经网络来完善教师和求解者的战略。