Style is a significant component of natural language text, reflecting a change in the tone of text while keeping the underlying information the same. Even though programming languages have strict syntax rules, they also have style. Code can be written with the same functionality but using different language features. However, programming style is difficult to quantify, and thus as part of this work, we define style attributes, specifically for Python. To build a definition of style, we utilized hierarchical clustering to capture a style definition without needing to specify transformations. In addition to defining style, we explore the capability of a pre-trained code language model to capture information about code style. To do this, we fine-tuned pre-trained code-language models and evaluated their performance in code style transfer tasks.
翻译:样式是自然语言文本的一个重要组成部分,反映了文字基调的变化,同时使基本信息保持不变。即使编程语言有严格的语法规则,它们也有风格。代码可以用相同的功能来写,但使用不同的语言特征。然而,编程风格很难量化,因此,作为这项工作的一部分,我们定义样式属性,特别是Python。为了建立样式定义,我们使用等级组合来捕捉样式定义,而不需要指定变换。除了定义样式外,我们还探索预先训练的代码语言模型是否有能力捕捉关于代码样式的信息。为了做到这一点,我们调整了预先训练的代码语言模型,并评价其在代码样式转换任务中的性能。