Event data are prevalent in diverse domains such as financial trading, business workflows and industrial IoT nowadays. An event is often characterized by several attributes denoting the meaning associated with the corresponding occurrence time/duration. From traditional operational systems in enterprises to online systems for Web services, event data is generated from physical world uninterruptedly. However, due to the variety and veracity features of Big data, event data generated from heterogeneous and dirty sources could have very different event representations and data quality issues. In this work, we summarize several typical works on studying data quality issues of event data, including: (1) event matching, (2) event error detection, (3) event data repair, and (4) approximate pattern matching.
翻译:事件数据在金融交易、商业工作流程和工业性IoT等不同领域十分普遍,活动的特点往往是若干属性,说明相应的发生时间/时间的相关含义。从企业的传统操作系统到网上网络服务系统,事件数据都是不间断地从实体世界生成的。然而,由于大数据的多样性和真实性特点,来自不同和肮脏来源的事件数据可能会有非常不同的事件表现和数据质量问题。在这项工作中,我们总结了研究事件数据数据质量问题的一些典型工作,包括:(1)事件匹配,(2)事件错误检测,(3)事件数据修复,(4)大致模式匹配。