自动通过深度学习生成Dockerfiles: 挑战和前景 (Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises) - 专知论文

会员服务 ·

0

T5 · 深度学习 · 结构化 · 结构 · 数据集 ·

2023 年 3 月 28 日

Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises

翻译：自动通过深度学习生成Dockerfiles: 挑战和前景

Giovanni Rosa,Antonio Mastropaolo,Simone Scalabrino,Gabriele Bavota,Rocco Oliveto

Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their software. Writing Dockerfiles is far from trivial, especially when the system has unusual requirements for its execution environment. Despite several tools exist to support developers in writing Dockerfiles, none of them is able to generate entire Dockerfiles from scratch given a high-level specification of the requirements of the execution environment. In this paper, we present a study in which we aim at understanding to what extent Deep Learning (DL), which has been proven successful for other coding tasks, can be used for this specific coding task. We preliminarily defined a structured natural language specification for Dockerfile requirements and a methodology that we use to automatically infer the requirements from the largest dataset of Dockerfiles currently available. We used the obtained dataset, with 670,982 instances, to train and test a Text-to-Text Transfer Transformer (T5) model, following the current state-of-the-art procedure for coding tasks, to automatically generate Dockerfiles from the structured specifications. The results of our evaluation show that T5 performs similarly to the more trivial IR-based baselines we considered. We also report the open challenges associated with the application of deep learning in the context of Dockerfile generation.

翻译：容器化使开发人员能够定义其软件需要安装的执行环境。Docker是该领域的主要平台，使用它的开发人员需要为其软件编写Dockerfile。编写Dockerfiles并不容易，特别是当系统对于其执行环境有非传统要求时。尽管存在几种工具来支持开发人员编写Dockerfiles，但它们都无法从高级规范中生成整个Dockerfile。本文介绍了我们进行的一项研究，旨在了解已被证明在其他编码任务中非常成功的深度学习（DL）是否可以用于此特定的编码任务。我们初步定义了Dockerfile要求的结构化自然语言规范和一种方法，用于从当前可用的最大Dockerfile数据集中自动推断要求。我们使用获得的包含670,982个实例的数据集来训练和测试基于文本到文本转换变压器（T5）模型，遵循当前编码任务的最新技术流程，以从结构化规范中自动生成Dockerfiles。我们的评估结果表明，T5执行与我们考虑的更为轻松的基于IR的基线类似。我们还报告了在Dockerfile生成的深度学习应用中面临的挑战。

0

相关内容

【2022新书】Python DevOps，245页pdf

【2022新书】Python DevOps，245页pdf

专知会员服务

91+阅读 · 2022年7月11日

【NUS-Xavier 教授】图神经网络应用概述，15页ppt

专知会员服务

52+阅读 · 2021年6月30日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

专知会员服务

16+阅读 · 2019年10月25日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

GAN生成式对抗网络

18+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

安装TensorFlow 2.0 preview进行深度学习（附Jupyter Notebook）

安装TensorFlow 2.0 preview进行深度学习（附Jupyter Notebook）

专知

10+阅读 · 2019年1月11日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

数据驱动的非刚体几何模型注册新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

数据驱动的人体图像语义分割研究

国家自然科学基金

4+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

听力损伤评价方法及计算模型

国家自然科学基金

0+阅读 · 2014年12月31日

基于无线双极电极的电致化学发光生物分析系统研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向多环芳烃快速分析的固相萃取和表面增强拉曼光谱联用技术

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的高速高精度定位操作的广义点对点迭代学习控制及应用

国家自然科学基金

0+阅读 · 2012年12月31日

利用STATCOM提高HVDC运行可靠性的机理研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于SSD的大规模元数据处理技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

Segment Anything Model for Medical Images?

Arxiv

1+阅读 · 2023年5月18日

Using a Large Language Model to Control Speaking Style for Expressive TTS

Arxiv

0+阅读 · 2023年5月17日

DesignTracking: Track and Replay BIM-based Design Process

Arxiv

0+阅读 · 2023年5月17日

Urban-StyleGAN: Learning to Generate and Manipulate Images of Urban Scenes

Arxiv

0+阅读 · 2023年5月16日

DLUE: Benchmarking Document Language Understanding

Arxiv

0+阅读 · 2023年5月16日

Auto-Tuning High-Performance Programs Using Model Checking in Promela

Arxiv

0+阅读 · 2023年5月16日

Knowledge Graphs: Opportunities and Challenges

Arxiv

172+阅读 · 2023年3月24日

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Arxiv

22+阅读 · 2022年9月29日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Explainable Recommendation: A Survey and New Perspectives

Explainable Recommendation: A Survey and New Perspectives

Arxiv

66+阅读 · 2019年8月15日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】Python DevOps，245页pdf

【2022新书】Python DevOps，245页pdf

专知会员服务

91+阅读 · 2022年7月11日

【NUS-Xavier 教授】图神经网络应用概述，15页ppt

专知会员服务

52+阅读 · 2021年6月30日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

专知会员服务

16+阅读 · 2019年10月25日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

GAN生成式对抗网络

18+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

安装TensorFlow 2.0 preview进行深度学习（附Jupyter Notebook）

安装TensorFlow 2.0 preview进行深度学习（附Jupyter Notebook）

专知

10+阅读 · 2019年1月11日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Segment Anything Model for Medical Images?

Arxiv

1+阅读 · 2023年5月18日

Using a Large Language Model to Control Speaking Style for Expressive TTS

Arxiv

0+阅读 · 2023年5月17日

DesignTracking: Track and Replay BIM-based Design Process

Arxiv

0+阅读 · 2023年5月17日

Urban-StyleGAN: Learning to Generate and Manipulate Images of Urban Scenes

Arxiv

0+阅读 · 2023年5月16日

DLUE: Benchmarking Document Language Understanding

Arxiv

0+阅读 · 2023年5月16日

Auto-Tuning High-Performance Programs Using Model Checking in Promela

Arxiv

0+阅读 · 2023年5月16日

Knowledge Graphs: Opportunities and Challenges

Arxiv

172+阅读 · 2023年3月24日

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Arxiv

22+阅读 · 2022年9月29日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Explainable Recommendation: A Survey and New Perspectives

Explainable Recommendation: A Survey and New Perspectives

Arxiv

66+阅读 · 2019年8月15日

相关基金

数据驱动的非刚体几何模型注册新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

数据驱动的人体图像语义分割研究

国家自然科学基金

4+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

听力损伤评价方法及计算模型

国家自然科学基金

0+阅读 · 2014年12月31日

基于无线双极电极的电致化学发光生物分析系统研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向多环芳烃快速分析的固相萃取和表面增强拉曼光谱联用技术

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的高速高精度定位操作的广义点对点迭代学习控制及应用

国家自然科学基金

0+阅读 · 2012年12月31日

利用STATCOM提高HVDC运行可靠性的机理研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于SSD的大规模元数据处理技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员