Programming or scripting languages used in real-world systems are seldom designed with a formal semantics in mind from the outset. Therefore, developing well-founded analysis tools for these systems requires reverse-engineering a formal semantics as a first step. This can take months or years of effort. Can we (at least partially) automate this process? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging, as found by Krishnamurthi et al. [2019]. In this paper, we highlight that scaling methods with the size of the language is very difficult due to state space explosion, so we propose to learn semantics incrementally. We give a formalisation of Krishnamurthi et al.'s desugaring learning framework in order to clarify the assumptions necessary for an incremental learning algorithm to be feasible. We show that this reformulation allows us to extend the search space and express rules that Krishnamurthi et al. described as challenging, while still retaining feasibility. We evaluate enumerative synthesis as a baseline algorithm, and demonstrate that, with our reformulation of the problem, it is possible to learn correct desugaring rules for the example source and core languages proposed by Krishnamurthi et al., in most cases identical to the intended rules. In addition, with user guidance, our system was able to synthesize rules for desugaring list comprehensions and try/catch/finally constructs.
翻译:在现实世界系统中使用的编程或脚本语言从一开始就很少设计成正式的语义学。 因此,为这些系统开发有充分根据的分析工具需要从反向设计正式的语义学作为第一步。 这可能需要数月或数年的努力。 我们能否(至少部分地)将这一过程自动化? 执行过程中自动反向工程语义学规则虽然是可取的,但执行过程中自动反向工程语义学规则非常具有挑战性, Krishnammurthi 等人([2019] 认为这仍然具有挑战性。 在本文件中,我们强调,由于国家空间爆炸,使用语言大小的缩放方法非常困难,因此我们建议逐步学习语义学。 我们给出了克里什纳穆西等人(Krishamurmurthi et al. 等人) 的解析框架,以便澄清渐进学习算法学框架的必要假设。 我们的改写法允许我们扩展搜索空间,表达克里什曼穆尔提等人(Krishinemurturthi et al.) 认为具有挑战性的规则,同时保留可行性。我们评价用比喻的合成合成方法作为基线算法, 并表明,随着我们的问题的重新校正重校正重研究,我们有可能在最终规则中学习规则中学习规则和核心的翻校略法则, 。