VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data - 专知论文

会员服务 ·

0

ML · 端到端 · Machine Learning · MoDELS · 测试数据 ·

2023 年 4 月 25 日

VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data

翻译：暂无翻译

An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data efficiently. Another critical issue is the model accuracy degradation by the difference between training data and testing data during the ML lifetime, which leads to lifecycle rebuild. Our system helps to detect this mismatch without getting labeled data from testing data and rebuild the ML lifecycle for a new data version. To demonstrate our contributions, we conduct experiments on real-world, large-scale datasets of driving images and spatiotemporal sensor data and show promising results.

翻译：暂无翻译

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

3维Lorentz空间中的伪圆纹Willmore曲面与4维球面中的共形曲面论

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

新型吡唑酰胺类化合物的设计合成及其生物活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车有级自动变速器多性能综合最优个性化智能换挡规律

国家自然科学基金

0+阅读 · 2012年12月31日

面向航空结构的多主体协作分布式光纤智能集成监测系统及其自诊断/自修复研究

国家自然科学基金

0+阅读 · 2012年12月31日

广西茉莉花茎、叶黄酮类化合物抗氧化活性及其构效关系和作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于分布式麦克风阵列的说话人定位与跟踪方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

Correcting for Selection Bias and Missing Response in Regression using Privileged Information

Arxiv

0+阅读 · 2023年6月12日

The EAS approach to variable selection for multivariate response data in high-dimensional settings

Arxiv

0+阅读 · 2023年6月12日

The Single Robot Line Coverage Problem: Theory, Algorithms, and Experiments

Arxiv

0+阅读 · 2023年6月11日

Probabilistic partition of unity networks for high-dimensional regression problems

Arxiv

0+阅读 · 2023年6月11日

Optimal Variable Clustering for High-Dimensional Matrix Valued Data

Arxiv

0+阅读 · 2023年6月9日

Octree-based hierarchical sampling optimization for the volumetric super-resolution of scientific data

Arxiv

0+阅读 · 2023年6月8日

Responsible Design Patterns for Machine Learning Pipelines

Arxiv

0+阅读 · 2023年6月7日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Affective Image Content Analysis: Two Decades Review and New Perspectives

Arxiv

16+阅读 · 2021年6月30日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Correcting for Selection Bias and Missing Response in Regression using Privileged Information

Arxiv

0+阅读 · 2023年6月12日

The EAS approach to variable selection for multivariate response data in high-dimensional settings

Arxiv

0+阅读 · 2023年6月12日

The Single Robot Line Coverage Problem: Theory, Algorithms, and Experiments

Arxiv

0+阅读 · 2023年6月11日

Probabilistic partition of unity networks for high-dimensional regression problems

Arxiv

0+阅读 · 2023年6月11日

Optimal Variable Clustering for High-Dimensional Matrix Valued Data

Arxiv

0+阅读 · 2023年6月9日

Octree-based hierarchical sampling optimization for the volumetric super-resolution of scientific data

Arxiv

0+阅读 · 2023年6月8日

Responsible Design Patterns for Machine Learning Pipelines

Arxiv

0+阅读 · 2023年6月7日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Affective Image Content Analysis: Two Decades Review and New Perspectives

Arxiv

16+阅读 · 2021年6月30日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

相关基金

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

3维Lorentz空间中的伪圆纹Willmore曲面与4维球面中的共形曲面论

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

新型吡唑酰胺类化合物的设计合成及其生物活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车有级自动变速器多性能综合最优个性化智能换挡规律

国家自然科学基金

0+阅读 · 2012年12月31日

面向航空结构的多主体协作分布式光纤智能集成监测系统及其自诊断/自修复研究

国家自然科学基金

0+阅读 · 2012年12月31日

广西茉莉花茎、叶黄酮类化合物抗氧化活性及其构效关系和作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于分布式麦克风阵列的说话人定位与跟踪方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员