改进数据发现和收集工作,采用以流动为基础的方案编制 (Towards better data discovery and collection with flow-based programming) - 专知论文

会员服务 ·

0

Better · ML · CARS · Processing（编程语言） · MoDELS ·

2021 年 10 月 25 日

Towards better data discovery and collection with flow-based programming

翻译：改进数据发现和收集工作,采用以流动为基础的方案编制

Andrei Paleyes,Christian Cabrera,Neil D. Lawrence

from arxiv, Extended version. Short version is accepted to Data-Centric AI Workshop, NeurIPS 2021

Despite huge successes reported by the field of machine learning, such as voice assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for data-oriented activities. This paper explores the potential of flow-based programming (FBP) for simplifying data discovery and collection in software systems. We compare FBP with the currently prevalent service-oriented paradigm to assess characteristics of each paradigm in the context of ML deployment. We develop a data processing application, formulate a subsequent ML deployment task, and measure the impact of the task implementation within both programming paradigms. Our main conclusion is that FBP shows great potential for providing data-centric infrastructural benefits for deployment of ML. Additionally, we provide an insight into the current trend that prioritizes model development over data quality management.

翻译：尽管在机器学习领域,如语音助理或自行驾驶汽车,据报告取得了巨大成功,但企业在生产中部署ML时仍然发现非常高的失败率。我们争辩说,部分原因是基础设施不是为面向数据的活动设计的。本文探讨了基于流动的编程(FBP)在简化软件系统中的数据发现和收集方面的潜力。我们将FBP与目前流行的面向服务的模式相比较,以评估ML部署中每个模式的特点。我们开发了一个数据处理应用程序,制定了随后的ML部署任务,并在两个方案拟订模式中衡量任务执行的影响。我们的主要结论是,FBP在为ML的部署提供以数据为中心的基础设施惠益方面有很大的潜力。此外,我们深入了解目前将模式开发置于数据质量管理之上的趋势。

0

相关内容

Better

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【KDD2020】图模型信息融合

专知会员服务

39+阅读 · 2020年10月15日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

专知会员服务

5+阅读 · 2019年12月1日

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

专知会员服务

8+阅读 · 2019年11月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人工智能 | COLT 2019等国际会议信息9条

人工智能 | COLT 2019等国际会议信息9条

Call4Papers

6+阅读 · 2018年9月21日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

Interval Privacy: A Framework for Privacy-Preserving Data Collection

Arxiv

0+阅读 · 2021年12月23日

A Variational Inference Framework for Inverse Problems

Arxiv

0+阅读 · 2021年12月23日

ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data

Arxiv

0+阅读 · 2021年12月22日

Long-Term Productivity Based on Science, not Preference

Arxiv

0+阅读 · 2021年12月15日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Arxiv

3+阅读 · 2019年9月3日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Neural Approaches to Conversational AI

Neural Approaches to Conversational AI

Arxiv

8+阅读 · 2018年12月13日

Dynamic and Static Topic Model for Analyzing Time-Series Document Collections

Arxiv

8+阅读 · 2018年5月6日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【KDD2020】图模型信息融合

专知会员服务

39+阅读 · 2020年10月15日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

【ECML-PKDD 2019】带歧义的分类变量编码（Encoding Categorical Variables with Ambiguity）

专知会员服务

5+阅读 · 2019年12月1日

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

【ICSI Lecture 2019】多媒体数据机器学习的实验设计（Experimental Design for Machine Learning on Multimedia Data）

专知会员服务

8+阅读 · 2019年11月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型幻觉：系统综述

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

【博士论文】数据与任务的物理学：深度学习中的局部性与组合性理论

代理式人工智能时代的决策优势

相关资讯

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人工智能 | COLT 2019等国际会议信息9条

人工智能 | COLT 2019等国际会议信息9条

Call4Papers

6+阅读 · 2018年9月21日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

相关论文

Interval Privacy: A Framework for Privacy-Preserving Data Collection

Arxiv

0+阅读 · 2021年12月23日

A Variational Inference Framework for Inverse Problems

Arxiv

0+阅读 · 2021年12月23日

ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data

Arxiv

0+阅读 · 2021年12月22日

Long-Term Productivity Based on Science, not Preference

Arxiv

0+阅读 · 2021年12月15日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Arxiv

3+阅读 · 2019年9月3日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Neural Approaches to Conversational AI

Neural Approaches to Conversational AI

Arxiv

8+阅读 · 2018年12月13日

Dynamic and Static Topic Model for Analyzing Time-Series Document Collections

Arxiv

8+阅读 · 2018年5月6日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员