数据质量问题在物联网数据中得到了广泛的重视,并阻碍了下游应用的发展。然而,提高物联网数据质量尤其具有挑战性,因为物联网数据具有明显的特征,如普遍的噪声、未对齐的时间戳、连续错误、列错位、相关错误等等。在本教程中,我们将回顾物联网数据质量管理的最新技术。特别地,我们将讨论这些专用方法如何改进各种数据质量维度,包括有效性、完整性和一致性。此外,我们还进一步强调了深度学习技术在物联网数据质量方面的最新进展。最后,我们指出了物联网数据质量管理的开放问题,如基准或数据质量问题的解释。
https://sxsong.github.io/tutorial-iotdq/
参考文献:
Jun Rao, Sangeeta Doraiswamy, Hetal Thakkar, Latha S. Colby: A Deferred Cleansing Method for RFID Data Analytics. VLDB 2006: 175-186
Ziawasch Abedjan, Cuneyt Gurcan Akcora, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker: Temporal Rules Discovery for Web Data Cleaning. Proc. VLDB Endow. 9(4): 336-347 (2015)
Lukasz Golab, Howard J. Karloff, Flip Korn, Avishek Saha, Divesh Srivastava: Sequential Dependencies. Proc. VLDB Endow. 2(1): 574-585 (2009)
Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu: SCREEN: Stream Data Cleaning under Speed Constraints. SIGMOD Conference 2015: 827-841
Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Francesco Parisi: Cleaning trajectory data of RFID-monitored objects through conditioning under integrityconstraints. EDBT 2014: 379-390
Shaoxu Song, Yue Cao, Jianmin Wang: Cleaning Timestamps with Temporal Constraints. Proc. VLDB Endow. 9(10): 708-719 (2016)
Jianmin Wang, Shaoxu Song, Xuemin Lin, Xiaochen Zhu, Jian Pei: Cleaning structured event logs: A graph repair approach. ICDE 2015: 30-41
Wush Chi-Hsuan Wu, Mi-Yen Yeh, Jian Pei: Random Error Reduction in Similarity Search on Time Series: A Statistical Approach. ICDE 2012: 858-869
Aoqian Zhang, Shaoxu Song, Jianmin Wang: Sequential Data Cleaning: A Statistical Approach. SIGMOD Conference 2016: 909-924
Tamraparni Dasu, Ji Meng Loh: Statistical Distortion: Consequences of Data Cleaning. Proc. VLDB Endow. 5(11): 1674-1683 (2012)
Chris Mayfield, Jennifer Neville, Sunil Prabhakar: ERACER: a database approach for statistical inference and data cleaning. SIGMOD Conference 2010: 75-86
Asif Iqbal Baba, Manfred Jaeger, Hua Lu, Torben Bach Pedersen, Wei-Shinn Ku, Xike Xie: Learning-Based Cleansing for Indoor RFID Data. SIGMOD Conference 2016: 925-936
Ruilin Liu, Guan Wang, Wendy Hui Wang, Flip Korn: iCoDA: Interactive and exploratory data completeness analysis. ICDE 2014: 1226-1229
Jianmin Wang, Shaoxu Song, Xiaochen Zhu, Xuemin Lin: Efficient Recovery of Missing Events. Proc. VLDB Endow. 6(10): 841-852 (2013)
Jianmin Wang, Shaoxu Song, Xiaochen Zhu, Xuemin Lin, Jiaguang Sun: Efficient Recovery of Missing Events. IEEE Trans. Knowl. Data Eng. 28(11): 2943-2957 (2016)
Lei Li, James McCann, Nancy S. Pollard, Christos Faloutsos: DynaMMo: mining and summarization of coevolving sequences with missing values. KDD 2009: 507-516
Yongjie Cai, Hanghang Tong, Wei Fan, Ping Ji, Qing He: Facets: Fast Comprehensive Mining of Coevolving High-order Time Series. KDD 2015: 79-88
Shawn R. Jeffery, Minos N. Garofalakis, Michael J. Franklin: Adaptive Cleaning for RFID Data Streams. VLDB 2006: 163-174
Thanh T. L. Tran, Charles Sutton, Richard Cocci, Yanming Nie, Yanlei Diao, Prashant J. Shenoy: Probabilistic Inference over RFID Streams in Mobile Environments. ICDE 2009: 1096-1107
Haiquan Chen, Wei-Shinn Ku, Haixun Wang, Min-Te Sun: Leveraging spatio-temporal redundancy for RFID data cleansing. SIGMOD Conference 2010: 51-62
Zhou Zhao, Wilfred Ng: A model-based approach for RFID data stream cleansing. CIKM 2012: 862-871
Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, Yitan Li: BRITS: Bidirectional Recurrent Imputation for Time Series. NeurIPS 2018: 6776-6786
Reza Asadi, Amelia Regan: A convolution recurrent autoencoder for spatio-temporal missing data imputation. CoRR abs/1904.12413 (2019)
Hongyuan Mei, Guanghui Qin, Jason Eisner: Imputing Missing Events in Continuous-Time Event Streams. ICML 2019: 4475-4485
Vincent Fortuin, Gunnar Rätsch, Stephan Mandt: Multivariate Time Series Imputation with Variational Autoencoders. CoRR abs/1907.04155 (2019)
Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, Xiaojie Yuan: Multivariate Time Series Imputation with Generative Adversarial Networks. NeurIPS 2018: 1603-1614
Yonghong Luo, Ying Zhang, Xiangrui Cai, Xiaojie Yuan: E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation. IJCAI 2019: 3094-3100
Yukai Liu, Rose Yu, Stephan Zheng, Eric Zhan, Yisong Yue: NAOMI: Non-Autoregressive Multiresolution Sequence Imputation. NeurIPS 2019: 11236-11246
Lei Cao, Yizhou Yan, Samuel Madden, Elke A. Rundensteiner, Mathan Gopalsamy: Efficient Discovery of Sequence Outlier Patterns. Proc. VLDB Endow. 12(8): 920-932 (2019)
Laure Berti-Équille, Tamraparni Dasu, Divesh Srivastava: Discovery of complex glitch patterns: A novel approach to Quantitative Data Cleaning. ICDE 2011: 733-744
Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P. Boedihardjo, Crystal Chen, Susan Frankenstein: Time series anomaly discovery with grammar-based compression. EDBT 2015: 481-492
Kexin Rong, Peter Bailis: ASAP: Prioritizing Attention via Time Series Smoothing. Proc. VLDB Endow. 10(11): 1358-1369 (2017)
Christos Faloutsos, Jan Gasthaus, Tim Januschowski, Yuyang Wang: Forecasting Big Time Series: Old and New. Proc. VLDB Endow. 11(12): 2102-2105 (2018)
Aoqian Zhang, Shaoxu Song, Jianmin Wang, Philip S. Yu: Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing. Proc. VLDB Endow. 10(10): 1046-1057 (2017)
Nikolay Laptev, Saeed Amizadeh, Ian Flint: Generic and Scalable Framework for Automated Time-series Anomaly Detection. KDD 2015: 1939-1947
Sharmila Subramaniam, Themis Palpanas, Dimitris Papadopoulos, Vana Kalogeraki, Dimitrios Gunopulos: Online Outlier Detection in Sensor Data Using Non-Parametric Models. VLDB 2006: 187-198
Pankaj Malhotra, Lovekesh Vig, Gautam M. Shroff, Puneet Agarwal: Long Short Term Memory Networks for Anomaly Detection in Time Series. ESANN 2015
Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, Gautam M. Shroff: LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection. CoRR abs/1607.00148 (2016)
Dan Li, Dacheng Chen, Baihong Jin, Lei Shi, Jonathan Goh, See-Kiong Ng: MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks. ICANN (4) 2019: 703-716
Fiete Lüer, Dominik Mautz, Christian Böhm: Anomaly Detection in Time Series using Generative Adversarial Networks. ICDM Workshops 2019: 1047-1048
专知便捷查看
便捷下载,请关注专知公众号(点击上方蓝色专知关注)
后台回复“iot155” 可以获取《【CIKM2020-清华】物联网数据质量,155页ppt,IoT Data Quality》专知下载链接索引