Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
翻译:程序图示通常是用于代码研究的机器学习的核心要素。 我们引入了开放源码 Python 图书馆 python_phraphs, 将静态分析用于构建适合培训机器学习模型的 Python 程序图示。 我们的图书馆承认了控制流图、数据流图和复合“程序图”的构建,它们结合了控制流、数据流、合成和关于一个程序的法律信息。 我们展示了图书馆的能力和局限性,进行了将图书馆应用到数以百万计的竞争性编程文件的案例研究,并展示了图书馆对机器学习研究的效用。