数据流中高效的子空间搜索 (Efficient Subspace Search in Data Streams) - 专知论文

会员服务 ·

0

子空间 · 流 · MINE · 异常点 · 赌博机/老虎机 ·

2021 年 1 月 7 日

Efficient Subspace Search in Data Streams

翻译：数据流中高效的子空间搜索

Edouard Fouché,Florian Kalinke,Klemens Böhm

from arxiv, Accepted Manuscript to Information Systems, Volume 97, Elsevier. Final authenticated version: https://doi.org/10.1016/j.is.2020.101705

In the real world, data streams are ubiquitous -- think of network traffic or sensor data. Mining patterns, e.g., outliers or clusters, from such data must take place in real time. This is challenging because (1) streams often have high dimensionality, and (2) the data characteristics may change over time. Existing approaches tend to focus on only one aspect, either high dimensionality or the specifics of the streaming setting. For static data, a common approach to deal with high dimensionality -- known as subspace search -- extracts low-dimensional, `interesting' projections (subspaces), in which patterns are easier to find. In this paper, we address both Challenge (1) and (2) by generalising subspace search to data streams. Our approach, Streaming Greedy Maximum Random Deviation (SGMRD), monitors interesting subspaces in high-dimensional data streams. It leverages novel multivariate dependency estimators and monitoring techniques based on bandit theory. We show that the benefits of SGMRD are twofold: (i) It monitors subspaces efficiently, and (ii) this improves the results of downstream data mining tasks, such as outlier detection. Our experiments, performed against synthetic and real-world data, demonstrate that SGMRD outperforms its competitors by a large margin.

翻译：在现实世界中,数据流是无处不在的 -- -- 以网络流量或传感器数据来思考。采矿模式,例如外部或集群,必须实时地从这些数据中找到。这具有挑战性,因为(1)流往往具有高度的维度,(2)数据特征可能随时间而变化。现有方法往往只侧重于一个方面,要么是高度的维度,要么是流流环境的具体方面。对于静态数据,一种处理高维度的共同方法 -- -- 称为子空间搜索 -- -- 提取低维度、“感兴趣”预测(子空间),其中的模式更容易找到。在本文件中,我们既处理挑战(1),又处理(2),对数据流进行一般的次空间搜索。我们的方法,即,即移动腐蚀性最大随机脱轨(SGMRD),监测高维数据流中有趣的亚空间。它利用新颖的多变依赖度估计器和基于强势理论的监测技术。我们表明,SGMRD的效益是双重的:(i)它能高效地监测子空间,并且(ii)我们既能地监测子空间,又能改进了我们进行的全球数据实验,又能探测。

0

相关内容

子空间

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

专知会员服务

17+阅读 · 2020年6月18日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

240+阅读 · 2020年1月21日

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

专知会员服务

74+阅读 · 2019年11月20日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Efficient Similarity Search in Dynamic Data Streams

Arxiv

0+阅读 · 2021年3月8日

Streaming Singular Value Decomposition for Big Data Applications

Arxiv

0+阅读 · 2021年3月8日

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Arxiv

0+阅读 · 2021年3月8日

Learning Graph Neural Networks with Positive and Unlabeled Nodes

Arxiv

0+阅读 · 2021年3月8日

Tensor Laplacian Regularized Low-Rank Representation for Non-uniformly Distributed Data Subspace Clustering

Arxiv

0+阅读 · 2021年3月6日

Approximate bi-criteria search by efficient representation of subsets of the Pareto-optimal frontier

Arxiv

0+阅读 · 2021年3月5日

The Complexity of Bicriteria Tree-Depth

Arxiv

0+阅读 · 2021年3月4日

Learning Discrete Structures for Graph Neural Networks

Arxiv

6+阅读 · 2019年5月17日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

专知会员服务

17+阅读 · 2020年6月18日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

240+阅读 · 2020年1月21日

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

专知会员服务

74+阅读 · 2019年11月20日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Efficient Similarity Search in Dynamic Data Streams

Arxiv

0+阅读 · 2021年3月8日

Streaming Singular Value Decomposition for Big Data Applications

Arxiv

0+阅读 · 2021年3月8日

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Arxiv

0+阅读 · 2021年3月8日

Learning Graph Neural Networks with Positive and Unlabeled Nodes

Arxiv

0+阅读 · 2021年3月8日

Tensor Laplacian Regularized Low-Rank Representation for Non-uniformly Distributed Data Subspace Clustering

Arxiv

0+阅读 · 2021年3月6日

Approximate bi-criteria search by efficient representation of subsets of the Pareto-optimal frontier

Arxiv

0+阅读 · 2021年3月5日

The Complexity of Bicriteria Tree-Depth

Arxiv

0+阅读 · 2021年3月4日

Learning Discrete Structures for Graph Neural Networks

Arxiv

6+阅读 · 2019年5月17日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

微信扫码咨询专知VIP会员