Safe-DS: 一种领域特定语言，使数据科学更加安全 (Safe-DS: A Domain Specific Language to Make Data Science Safe) - 专知论文

会员服务 ·

0

DirectShow · Python · 数据科学 · 静态检测 · 捕获 ·

2023 年 4 月 7 日

Safe-DS: A Domain Specific Language to Make Data Science Safe

翻译：Safe-DS: 一种领域特定语言，使数据科学更加安全

Lars Reimann,Günter Kniesel-Wünsche

from arxiv, Accepted for the NIER Track of the 45th International Conference on Software Engineering (ICSE 2023)

Due to the long runtime of Data Science (DS) pipelines, even small programming mistakes can be very costly, if they are not detected statically. However, even basic static type checking of DS pipelines is difficult because most are written in Python. Static typing is available in Python only via external linters. These require static type annotations for parameters or results of functions, which many DS libraries do not provide. In this paper, we show how the wealth of Python DS libraries can be used in a statically safe way via Safe-DS, a domain specific language (DSL) for DS. Safe-DS catches conventional type errors plus errors related to range restrictions, data manipulation, and call order of functions, going well beyond the abilities of current Python linters. Python libraries are integrated into Safe-DS via a stub language for specifying the interface of its declarations, and an API-Editor that is able to extract type information from the code and documentation of Python libraries, and automatically generate suitable stubs. Moreover, Safe-DS complements textual DS pipelines with a graphical representation that eases safe development by preventing syntax errors. The seamless synchronization of textual and graphic view lets developers always choose the one best suited for their skills and current task. We think that Safe-DS can make DS development easier, faster, and more reliable, significantly reducing development costs.

翻译：由于数据科学（DS）管道的长运行时间，即使是小的编程错误，如果它们不是静态检测，也可能非常昂贵。然而，即使是DS管道的基本静态类型检查也很困难，因为大多数都是用Python编写的。在Python中，静态类型仅通过外部linter可用。这些需要参数或函数结果的静态类型注释，而许多DS库不提供。在本文中，我们展示了如何通过Safe-DS，一种针对DS的领域特定语言（DSL），以静态安全的方式使用Python DS库的丰富性。Safe-DS捕获传统的类型错误以及与范围限制，数据操作和函数调用顺序有关的错误，远远超出当前Python linter的能力。Python库通过一种存根语言集成到Safe-DS中，用于指定其声明的接口，以及一个API编辑器，能够从Python库的代码和文档中提取类型信息，并自动生成合适的存根。此外，Safe-DS通过一种图形表示形式补充了文本DS管道，通过防止语法错误来简化安全开发。文本和图形视图的无缝同步使开发人员始终可以选择最适合他们的技能和当前任务的视图。我们认为，Safe-DS可以使DS开发更加容易，更快速，更可靠，从而显着降低开发成本。

0

相关内容

DirectShow

DirectShow是一种由微软公司开发的能够让软件开发者对媒体文件执行各种不同处理的应用程序设计接口。

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

252+阅读 · 2022年8月31日

【2022新书】Python DevOps，245页pdf

【2022新书】Python DevOps，245页pdf

专知会员服务

91+阅读 · 2022年7月11日

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

专知会员服务

36+阅读 · 2022年2月22日

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

专知会员服务

76+阅读 · 2022年2月5日

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

专知会员服务

76+阅读 · 2021年8月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

【2020新书】实战R语言4，323页pdf

【2020新书】实战R语言4，323页pdf

专知会员服务

102+阅读 · 2020年7月1日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

CSDN

0+阅读 · 2022年11月30日

10 个数据分析师必须知道的 SQL 查询语法

10 个数据分析师必须知道的 SQL 查询语法

CSDN

0+阅读 · 2022年9月13日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Sreg 一款社工小工具

Sreg 一款社工小工具

黑白之道

12+阅读 · 2019年8月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于捕获/重放机制的客户端JavaScript应用调试与分析研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于本体论的海洋流动场时空数据建模与可视化

国家自然科学基金

0+阅读 · 2014年12月31日

意大利蜜蜂级型分化关键基因Dnmt3启动子的分析及其上游转录调控因子的鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下基于运行时模型的管理复用关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于上转换荧光纳米粒子表面核酸等温延伸microRNA信息检测技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

二阶逻辑的表达能力与计算复杂性

国家自然科学基金

0+阅读 · 2009年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月26日

A Methodology and Software Architecture to Support Explainability-by-Design

Arxiv

0+阅读 · 2023年5月25日

Demystifying Privacy Policy of Third-Party Libraries in Mobile Apps

Arxiv

0+阅读 · 2023年5月25日

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

Arxiv

0+阅读 · 2023年5月24日

Gorilla: Large Language Model Connected with Massive APIs

Arxiv

1+阅读 · 2023年5月24日

Recent Advancements in Deep Learning Applications and Methods for Autonomous Navigation: A Comprehensive Review

Arxiv

0+阅读 · 2023年5月23日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey of the State of Explainable AI for Natural Language Processing

Arxiv

26+阅读 · 2020年10月1日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

252+阅读 · 2022年8月31日

【2022新书】Python DevOps，245页pdf

【2022新书】Python DevOps，245页pdf

专知会员服务

91+阅读 · 2022年7月11日

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

专知会员服务

36+阅读 · 2022年2月22日

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

专知会员服务

76+阅读 · 2022年2月5日

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

专知会员服务

76+阅读 · 2021年8月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

【2020新书】实战R语言4，323页pdf

【2020新书】实战R语言4，323页pdf

专知会员服务

102+阅读 · 2020年7月1日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

CSDN

0+阅读 · 2022年11月30日

10 个数据分析师必须知道的 SQL 查询语法

10 个数据分析师必须知道的 SQL 查询语法

CSDN

0+阅读 · 2022年9月13日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Sreg 一款社工小工具

Sreg 一款社工小工具

黑白之道

12+阅读 · 2019年8月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

相关论文

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月26日

A Methodology and Software Architecture to Support Explainability-by-Design

Arxiv

0+阅读 · 2023年5月25日

Demystifying Privacy Policy of Third-Party Libraries in Mobile Apps

Arxiv

0+阅读 · 2023年5月25日

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

Arxiv

0+阅读 · 2023年5月24日

Gorilla: Large Language Model Connected with Massive APIs

Arxiv

1+阅读 · 2023年5月24日

Recent Advancements in Deep Learning Applications and Methods for Autonomous Navigation: A Comprehensive Review

Arxiv

0+阅读 · 2023年5月23日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey of the State of Explainable AI for Natural Language Processing

Arxiv

26+阅读 · 2020年10月1日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

相关基金

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于捕获/重放机制的客户端JavaScript应用调试与分析研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于本体论的海洋流动场时空数据建模与可视化

国家自然科学基金

0+阅读 · 2014年12月31日

意大利蜜蜂级型分化关键基因Dnmt3启动子的分析及其上游转录调控因子的鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下基于运行时模型的管理复用关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于上转换荧光纳米粒子表面核酸等温延伸microRNA信息检测技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

二阶逻辑的表达能力与计算复杂性

国家自然科学基金

0+阅读 · 2009年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员