For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that over the years protocol implementations exhibited bugs and non-interoperabilities. In this paper, we explore to what extent natural language processing (NLP), an area that has made impressive strides in recent years, can be used to generate protocol implementations. We advocate a semi-automated protocol generation approach, Sage, that can be used to uncover ambiguous or under-specified sentences in specifications; these can then be fixed by a human iteratively until Sage is able to generate protocol code automatically. Using an implementation of Sage, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC, after fixing which Sage is able to generate code automatically that interoperates perfectly with Linux implementations. We demonstrate the ability to generalize Sage to parts of IGMP and NTP. We also find that Sage supports half of the conceptual components found in major standards protocols; this suggests that, with some additional machinery, Sage may be able to generalize to TCP and BGP.
翻译:几十年来,互联网协议一直以自然语言具体化。鉴于这种文本的含混性,多年来,协议执行中出现了错误和非互不兼容性,这并不奇怪。在本文件中,我们探索了自然语言处理(NLP)在多大程度上可以用于产生协议执行,这是近年来取得令人印象深刻进展的一个领域。我们倡导半自动协议生成方法(Sage),可以用来在规格中发现模糊或未加具体规定的句子;然后,这些可以由人类迭代式固定,直到Sage能够自动生成协议代码。我们通过Sage的实施,发现在IMC RFC中存在5个模糊和6个具体不足的案例,在确定哪个Sage能够自动生成与Linux实施完美操作的代码之后。我们展示了将Sage普及到IGMP和NTP部分的能力。我们还发现Sage支持主要标准协议中发现的概念组成部分的一半;我们发现,由于一些额外的机制,Sage可能能够对TCP和BGP进行概括。