tsrobprep -- -- R 套件,用于对时间序列数据进行稳健的预处理 (tsrobprep -- an R package for robust preprocessing of time series data) - 专知论文

会员服务 ·

0

稳健性 · 异常点 · 数据填补 · MoDELS · tuning ·

2021 年 4 月 26 日

tsrobprep -- an R package for robust preprocessing of time series data

翻译：tsrobprep -- -- R 套件,用于对时间序列数据进行稳健的预处理

Michał Narajewski,Jens Kley-Holsteg,Florian Ziel

Data cleaning is a crucial part of every data analysis exercise. Yet, the currently available R packages do not provide fast and robust methods for cleaning and preparation of time series data. The open source package tsrobprep introduces efficient methods for handling missing values and outliers using model based approaches. For data imputation a probabilistic replacement model is proposed, which may consist of autoregressive components and external inputs. For outlier detection a clustering algorithm based on finite mixture modelling is introduced, which considers typical time series related properties as features. By assigning to each observation a probability of being an outlying data point, the degree of outlyingness can be determined. The methods work robust and are fully tunable. Moreover, by providing the auto_data_cleaning function the data preprocessing can be carried out in one cast, without manual tuning and providing suitable results. The primary motivation of the package is the preprocessing of energy system data, however, the package is also suited for other moderate and large sized time series data set. We present application for electricity load, wind and solar power data.

翻译：数据清理是每项数据分析工作的一个关键部分。然而,目前可用的 R 包并不提供快速和稳健的清理和时间序列数据编制方法。开放源代码包 tsrobprep 采用基于模型的方法,采用高效的方法处理缺失的值和外部值。对于数据估算,提出了一种概率替代模型,该模型可能由自动递增组件和外部输入组成。对于基于有限混合物模型的群集算法,引入了超强检测算法,该算法将典型的时间序列特性视为特性。通过给每观察点分配一个外围数据点的概率,可以确定外围值的程度。该方法非常健全,完全可以捕捉。此外,通过提供自动数据清理功能,数据预处理可以一次性完成,无需手动调整和提供适当结果。该包的主要动机是先处理能源系统数据,然而,该包也适合其他中大型的时间序列数据集。我们介绍了电荷载、风能和太阳能数据的应用。

0

相关内容

稳健性

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

异构混合并行计算综述

专知会员服务

41+阅读 · 2020年8月14日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

经济学中的数据科学，Data Science in Economics，附22页pdf

经济学中的数据科学，Data Science in Economics，附22页pdf

专知会员服务

36+阅读 · 2020年4月1日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

金融领域自然语言处理研究资源大列表

金融领域自然语言处理研究资源大列表

专知

13+阅读 · 2020年2月27日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

已删除

将门创投

7+阅读 · 2019年3月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

338页新书《Deep Learning in Natural Language Processing》

338页新书《Deep Learning in Natural Language Processing》

机器学习算法与Python学习

9+阅读 · 2018年11月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Comparison of Outlier Detection Techniques for Structured Data

Arxiv

0+阅读 · 2021年6月16日

RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests

Arxiv

0+阅读 · 2021年6月15日

Post-Processing of MCMC

Arxiv

0+阅读 · 2021年6月13日

Distributionally Robust Optimization with Markovian Data

Arxiv

0+阅读 · 2021年6月12日

Calibration and Uncertainty Quantification of Convective Parameters in an Idealized GCM

Arxiv

0+阅读 · 2021年6月10日

Implicit Representations of Meaning in Neural Language Models

Arxiv

0+阅读 · 2021年6月1日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

Anomaly DetectionWith Multiple-Hypotheses Predictions

Arxiv

6+阅读 · 2019年1月28日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Image Segmentation Using Subspace Representation and Sparse Decomposition

Arxiv

6+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

异构混合并行计算综述

专知会员服务

41+阅读 · 2020年8月14日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

经济学中的数据科学，Data Science in Economics，附22页pdf

经济学中的数据科学，Data Science in Economics，附22页pdf

专知会员服务

36+阅读 · 2020年4月1日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

金融领域自然语言处理研究资源大列表

金融领域自然语言处理研究资源大列表

专知

13+阅读 · 2020年2月27日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

已删除

将门创投

7+阅读 · 2019年3月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

338页新书《Deep Learning in Natural Language Processing》

338页新书《Deep Learning in Natural Language Processing》

机器学习算法与Python学习

9+阅读 · 2018年11月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Comparison of Outlier Detection Techniques for Structured Data

Arxiv

0+阅读 · 2021年6月16日

RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests

Arxiv

0+阅读 · 2021年6月15日

Post-Processing of MCMC

Arxiv

0+阅读 · 2021年6月13日

Distributionally Robust Optimization with Markovian Data

Arxiv

0+阅读 · 2021年6月12日

Calibration and Uncertainty Quantification of Convective Parameters in an Idealized GCM

Arxiv

0+阅读 · 2021年6月10日

Implicit Representations of Meaning in Neural Language Models

Arxiv

0+阅读 · 2021年6月1日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

Anomaly DetectionWith Multiple-Hypotheses Predictions

Arxiv

6+阅读 · 2019年1月28日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Image Segmentation Using Subspace Representation and Sparse Decomposition

Arxiv

6+阅读 · 2018年4月6日

微信扫码咨询专知VIP会员