具有多层次分歧的可解释时间序列代表性学习 (Interpretable Time-series Representation Learning With Multi-Level Disentanglement)

Time-series representation learning is a fundamental task for time-series analysis. While significant progress has been made to achieve accurate representations for downstream applications, the learned representations often lack interpretability and do not expose semantic meanings. Different from previous efforts on the entangled feature space, we aim to extract the semantic-rich temporal correlations in the latent interpretable factorized representation of the data. Motivated by the success of disentangled representation learning in computer vision, we study the possibility of learning semantic-rich time-series representations, which remains unexplored due to three main challenges: 1) sequential data structure introduces complex temporal correlations and makes the latent representations hard to interpret, 2) sequential models suffer from KL vanishing problem, and 3) interpretable semantic concepts for time-series often rely on multiple factors instead of individuals. To bridge the gap, we propose Disentangle Time Series (DTS), a novel disentanglement enhancement framework for sequential data. Specifically, to generate hierarchical semantic concepts as the interpretable and disentangled representation of time-series, DTS introduces multi-level disentanglement strategies by covering both individual latent factors and group semantic segments. We further theoretically show how to alleviate the KL vanishing problem: DTS introduces a mutual information maximization term, while preserving a heavier penalty on the total correlation and the dimension-wise KL to keep the disentanglement property. Experimental results on various real-world benchmark datasets demonstrate that the representations learned by DTS achieve superior performance in downstream applications, with high interpretability of semantic concepts.

翻译：时间序列代表制学习是时间序列分析的一项基本任务。虽然在为下游应用实现准确表述方面取得了显著进展,但所学的表述方式往往缺乏解释性,没有暴露语义含义。不同于以往在纠缠的地貌空间上的努力,我们的目标是在数据的潜在可解释因子化代表制中提取语义丰富的时间序列相关性。由于计算机愿景中解析性代表制学习的成功,我们研究了学习语义丰富的时间序列代表制的可能性。由于三大挑战,这些解释性表述方式仍未得到探讨:(1) 相继数据结构引入复杂的时间关系,使潜在的表达方式难以解释;(2) 相继模型存在KL消失的问题;(3) 时间序列的可解释性概念往往依赖多种因素,而不是个人。为了弥合这一差距,我们提议对时间序列数据进行新的分解性增强框架。具体而言,生成等级的语义性系概念,作为时间序列的可解释性和不相交错的表述方式,DTT在多层次上引入不相交错的描述方式,同时通过覆盖单个的正深层数据理解性解释,同时展示一个双向级的递变的递变的递变的逻辑结构,同时展示一个隐性定义。

相关内容

表示学习

关注 186

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。