There has been growing interest in automatically predicting missing type annotations in programs written in Python and JavaScript. While prior methods have achieved impressive accuracy when predicting the most common types, they often perform poorly on rare or complex types. In this paper, we present a new type inference method that treats type prediction as a code infilling task by leveraging CodeT5, a state-of-the-art seq2seq pre-trained language model for code. Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model. We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context, allowing information exchange between related code elements. Our evaluation shows that the proposed approach, TypeT5, not only achieves a higher overall accuracy (particularly on rare and complex types) but also produces more coherent results with fewer type errors -- while enabling easy user intervention.
翻译:在自动预测Python和JavaScript程序中缺失的类型注释方面,越来越多的人产生了兴趣。虽然之前的方法在预测最常见的类型时取得了令人印象深刻的准确性,但它们在罕见或复杂类型上表现不佳。在本文中,我们提出了一种新的类型推断方法,将类型预测视为代码填充任务,利用CodeT5,这是一种面向代码的seq2seq预先训练的语言模型。我们的方法使用静态分析为每个代码元素构建动态上下文,其类型签名由模型预测。我们还提出了一种迭代解码方案,将先前的类型预测并入模型的输入上下文中,允许相关代码元素之间进行信息交换。我们的评估表明,所提出的方法TypeT5不仅在整体准确性(特别是在罕见和复杂类型上)方面取得了更高的准确性,而且产生了更一致的结果,且类型错误更少 - 同时使用户干预更容易。