Aggregated time series are generated effortlessly everywhere, e.g., "total confirmed covid-19 cases since 2019" and "total liquor sales over time." Understanding "how" and "why" these key performance indicators (KPI) evolve over time is critical to making data-informed decisions. Existing explanation engines focus on explaining one aggregated value or the difference between two relations. However, this falls short of explaining KPIs' continuous changes over time. Motivated by this, we propose TSEXPLAIN, a system that explains aggregated time series by surfacing the underlying evolving top contributors. Under the hood, we leverage prior works on two-relations diff as a building block and formulate a K-Segmentation problem to segment the time series such that each segment after segmentation shares consistent explanations, i.e., contributors. To quantify consistency in each segment, we propose a novel within-segment variance design that is explanation-aware; to derive the optimal K-Segmentation scheme, we develop an efficient dynamic programming algorithm. Experiments on synthetic and real-world datasets show that our explanation-aware segmentation can effectively identify evolving explanations for aggregated time series and outperform explanation-agnostic segmentation. Further, we proposed an optimal selection strategy of K and several optimizations to speed up TSEXPLAIN for interactive user experience, achieving up to 13X efficiency improvement.
翻译:集成时间序列的生成是无懈可击的, 例如, “ 2019年以来已确认的总量19个案例 ” 和 “ 酒精销售总量 ” 。 理解“ 如何” 和“ 为何” 这些关键业绩指标(KPI) 随时间演变对于做出数据知情的决定至关重要。 现有的解释引擎侧重于解释一个总值或两个关系之间的差别。 但是, 这还不足以解释KPIs随时间变化的变化。 我们为此提议了TEXPLAIN, 这个系统通过直线显示潜在的最高贡献者来解释综合时间序列。 在引擎头下,我们利用以前关于两个关系的工作作为建筑块, 并设计K- Segration问题来分割时间序列, 这样每个部分在分割后每个部分都分享一致的解释, 即贡献者。 为了量化每个部分的一致性, 我们提出一个全新的内部分类差异设计, 以此为解释; 为了获得最佳 K- 的 K- 分化计划, 我们开发一个高效的动态编程算法。 在合成和现实世界数据集中进行实验, 显示我们的解释- 优化的节流化的分化战略, 能够有效地确定我们如何实现最佳的用户选择的分化。