DROP: 时间序列优化度 (DROP: Dimensionality Reduction Optimization for Time Series)

Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds conventional approaches like FFT and PAA by up to 16x in end-to-end workloads.

翻译：减少尺寸是扩大机器学习管道的关键一步。主元件分析(PCA)是降低维度的标准工具,但是在全数据集中进行五氯苯甲醚的操作成本极高。因此,理论工作研究了在数据样品中操作的迭接、随机的五氯苯甲醚方法的有效性。然而,对随机的五氯苯甲醚的终止条件或者执行预先确定的迭代数,或者直到解决方案趋于一致,经常为终端到终端运行时间的改进对数据点进行过多或过少的取样。我们展示了如何在通过五氯苯甲醚进行下游分析操作期间进行会计核算,使得在小型(例如1%)投入数据分样操作后,随机分析方法能够有效终止,减少整个工作量运行时间。我们为此建议DROP,即DROP,一个DR优化器,使Singulal-Value-Decomposition的五氯苯甲醚技术的加速速度达到5x以上,并且超过FFT和PAAAA的常规方法,在终端到终端工作量中达到16x。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

专知会员服务

170+阅读 · 2020年5月10日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf