时间序列数据高多重性击球推断 (High-Dimensional Knockoffs Inference for Time Series Data)

The framework of model-X knockoffs provides a flexible tool for exact finite-sample false discovery rate (FDR) control in variable selection. It also completely bypasses the use of conventional p-values, making it especially appealing in high-dimensional nonlinear models. Existing works have focused on the setting of independent and identically distributed observations. Yet time series data is prevalent in practical applications. This motivates the study of model-X knockoffs inference for time series data. In this paper, we make some initial attempt to establish the theoretical and methodological foundation for the model-X knockoffs inference for time series data. We suggest the method of time series knockoffs inference (TSKI) by exploiting the idea of subsampling to alleviate the difficulty caused by the serial dependence. We establish sufficient conditions under which the original model-X knockoffs inference combined with subsampling still achieves the asymptotic FDR control. Our technical analysis reveals the exact effect of serial dependence on the FDR control. To alleviate the practical concern on the power loss because of reduced sample size cause by subsampling, we exploit the idea of knockoffs with copies and multiple knockoffs. Under fairly general time series model settings, we show that the FDR remains to be controlled asymptotically. To theoretically justify the power of TSKI, we further suggest the new knockoff statistic, the backward elimination ranking (BE) statistic, and show that it enjoys both the sure screening property and controlled FDR in the linear time series model setting. The theoretical results and appealing finite-sample performance of the suggested TSKI method coupled with the BE are illustrated with several simulation examples and an economic inflation forecasting application.

翻译：模型- X 的取舍框架提供了一个灵活工具, 用于在变量选择中精确的定点抽样假发现率( FDR) 控制。它也完全绕过常规的 p- 值使用, 在高维非线性模型中特别吸引它。现有的工程侧重于设置独立且分布相同的观测。然而时间序列数据在实际应用中很普遍。这促使了对模型- X 的取舍推导时间序列数据的研究。在本文中, 我们初步尝试为模型- X 模拟错判率( FDR) 在时间序列数据中建立理论和方法基础。我们建议使用时间序列错判( TSKI) 的方法, 利用子样本序列模拟的测算方法, 减轻序列依赖造成的困难。我们建立了足够的条件, 原始模型- X 错判和子序列数据序列数据在实际应用中仍然达到低温调 FDR 控制。我们的技术分析揭示了对FDR 序列测算的精确依赖性效果。为了减轻对动力损失的实际关注, 因为它的样本大小减少, TTFKI 的测序级测算结果, 我们利用了基础的测算的测算的测算, 我们利用了直流测算的测算法的测算, 显示了FDRDRDR 的测算的测算的测算, 我们的测算法的测算法的测算法的测算法的测算的测算法, 向的测算法, 显示的测算结果的测算法的测算结果, 显示的测算法, 向的测算法, 我们的测算法的测算法的测算法的测算, 我们的测算的测算的测算法的测算的测算法的测算的测算的测算的测算法的测算法的测算法, 我们的测算法的测算法, 显示的测的测算法的测算法的测算法的测算的测算的测算法的测算的测算的测算的测算的测算, 显示的测算的测算的测算的测算的测算的测算的测算的测算的测算的测算的测算的测算的测算