For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, SAGE, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the spec author, SAGE can generate protocol code automatically. Using SAGE, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after clarification, SAGE is able to automatically generate code that interoperates perfectly with Linux implementations. We show that SAGE generalizes to BFD, IGMP, and NTP. We also find that SAGE supports many of the conceptual components found in key protocols, suggesting that, with some additional machinery, SAGE may be able to generalize to TCP and BGP.
翻译:几十年来,互联网协议一直以自然语言具体化。鉴于这种文本的内在模糊性,协议的执行长期存在错误并不奇怪。在本文中,我们应用自然语言处理(NLP)来从规格文本中实现协议执行的半自动生成。我们的系统SAGE(SAGE)可以在规格中发现模棱两可或未充分指定的句子;一旦这些句子得到规格作者的澄清,SAGE(SAGE)可以自动生成协议代码。我们利用SAGE(SAGE)发现5个模糊性案例和6个具体化不足的例子。在IMCRFC(IS)中,我们发现SAGE能够自动生成与Linux(Linux)执行完美操作的代码。我们显示SAGE(SAG)对BFD、IGMP(IGMP)和NTP(NTP)的概括性。我们还发现SAGEAGE支持关键协议中发现的许多概念组成部分,这意味着,如果有其他机制,SAGE(SAGE)可以对TCP和BGP(BGP)进行概括化。