Modeling hierarchical latent dynamics behind time series data is critical for capturing temporal dependencies across multiple levels of abstraction in real-world tasks. However, existing temporal causal representation learning methods fail to capture such dynamics, as they fail to recover the joint distribution of hierarchical latent variables from \textit{single-timestep observed variables}. Interestingly, we find that the joint distribution of hierarchical latent variables can be uniquely determined using three conditionally independent observations. Building on this insight, we propose a Causally Hierarchical Latent Dynamic (CHiLD) identification framework. Our approach first employs temporal contextual observed variables to identify the joint distribution of multi-layer latent variables. Sequentially, we exploit the natural sparsity of the hierarchical structure among latent variables to identify latent variables within each layer. Guided by the theoretical results, we develop a time series generative model grounded in variational inference. This model incorporates a contextual encoder to reconstruct multi-layer latent variables and normalize flow-based hierarchical prior networks to impose the independent noise condition of hierarchical latent dynamics. Empirical evaluations on both synthetic and real-world datasets validate our theoretical claims and demonstrate the effectiveness of CHiLD in modeling hierarchical latent dynamics.
翻译:对时间序列数据背后的层次化潜在动态进行建模,对于捕捉现实世界任务中跨多个抽象层次的时间依赖性至关重要。然而,现有的时序因果表征学习方法未能捕捉此类动态,因为它们无法从\textit{单时间步观测变量}中恢复层次化潜在变量的联合分布。有趣的是,我们发现利用三个条件独立的观测值可以唯一确定层次化潜在变量的联合分布。基于这一洞见,我们提出了一个因果层次化潜在动态识别框架。我们的方法首先利用时序上下文观测变量来识别多层潜在变量的联合分布。随后,我们利用潜在变量间层次结构的天然稀疏性来识别每一层内的潜在变量。在理论结果的指导下,我们开发了一个基于变分推理的时间序列生成模型。该模型包含一个用于重构多层潜在变量的上下文编码器,以及基于归一化流的层次化先验网络,以施加层次化潜在动态的独立噪声条件。在合成数据集和真实世界数据集上的实证评估验证了我们的理论主张,并证明了所提框架在建模层次化潜在动态方面的有效性。