Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and productivity. Lack of static typing can cause run-time exceptions and is a major factor for weak IDE support. To alleviate these issues, PEP 484 introduced optional type annotations for Python. As retrofitting types to existing codebases is error-prone and laborious, machine learning (ML)-based approaches have been proposed to enable automatic type inference based on existing, partially annotated codebases. However, previous ML-based approaches are trained and evaluated on human-provided type annotations, which might not always be sound, and hence this may limit the practicality for real-world usage. In this paper, we present Type4Py, a deep similarity learning-based hierarchical neural network model. It learns to discriminate between similar and dissimilar types in a high-dimensional space, which results in clusters of types. Likely types for arguments, variables, and return values can then be inferred through the nearest neighbor search. Unlike previous work, we trained and evaluated our model on a type-checked dataset and used mean reciprocal rank (MRR) to reflect the performance perceived by users. The obtained results show that Type4Py achieves an MRR of 77.1%, which is a substantial improvement of 8.1% and 16.7% over the state-of-the-art approaches Typilus and TypeWriter, respectively. Finally, to aid developers with retrofitting types, we released a Visual Studio Code extension, which uses Type4Py to provide ML-based type auto-completion for Python.
翻译:动态语言,如Python 和 Javascript; 动态语言,如Python 和 Javascript; 用于开发者灵活性和生产率的贸易静态打字。 缺乏静态打字可引起运行时的例外,因此是弱 IDE 支持的一个主要因素。 为了缓解这些问题, PEP 484 引入了Python 的任择型号说明。 由于对现有代码库的改装类型容易出错,而且难度很大,因此提出了基于机器学习(ML)的方法,以便能够根据现有的部分附加说明的代码库进行自动类型推导。然而,以前基于 ML 的基于 人提供的类型说明的培训和评价,这些说明可能总是不健全,因此可能会限制真实世界使用的实用性。 在本文中,我们介绍了 Ty4PPy, 一个基于深度类似学习的神经网络模型模型。 它在高空空间中区分相似和非相似的类型。 很可能通过最近的邻居搜索来推断参数、变量和返回值值值值值值。 与以前的工作不同, 我们培训和评估了我们的模型数据转换的模型的模型, 并且分别使用了对等版本的版本的版本的版本的版本的版本的版本, 它的版本, 它的版本, 显示了16的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本。