Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their software. Writing Dockerfiles is far from trivial, especially when the system has unusual requirements for its execution environment. Despite several tools exist to support developers in writing Dockerfiles, none of them is able to generate entire Dockerfiles from scratch given a high-level specification of the requirements of the execution environment. In this paper, we present a study in which we aim at understanding to what extent Deep Learning (DL), which has been proven successful for other coding tasks, can be used for this specific coding task. We preliminarily defined a structured natural language specification for Dockerfile requirements and a methodology that we use to automatically infer the requirements from the largest dataset of Dockerfiles currently available. We used the obtained dataset, with 670,982 instances, to train and test a Text-to-Text Transfer Transformer (T5) model, following the current state-of-the-art procedure for coding tasks, to automatically generate Dockerfiles from the structured specifications. The results of our evaluation show that T5 performs similarly to the more trivial IR-based baselines we considered. We also report the open challenges associated with the application of deep learning in the context of Dockerfile generation.
翻译:容器化使开发人员能够定义其软件需要安装的执行环境。Docker是该领域的主要平台,使用它的开发人员需要为其软件编写Dockerfile。编写Dockerfiles并不容易,特别是当系统对于其执行环境有非传统要求时。尽管存在几种工具来支持开发人员编写Dockerfiles,但它们都无法从高级规范中生成整个Dockerfile。本文介绍了我们进行的一项研究,旨在了解已被证明在其他编码任务中非常成功的深度学习(DL)是否可以用于此特定的编码任务。我们初步定义了Dockerfile要求的结构化自然语言规范和一种方法,用于从当前可用的最大Dockerfile数据集中自动推断要求。我们使用获得的包含670,982个实例的数据集来训练和测试基于文本到文本转换变压器(T5)模型,遵循当前编码任务的最新技术流程,以从结构化规范中自动生成Dockerfiles。我们的评估结果表明,T5执行与我们考虑的更为轻松的基于IR的基线类似。我们还报告了在Dockerfile生成的深度学习应用中面临的挑战。