Based on technological advances in sensing modalities, randomized trials with primary outcomes represented as high-dimensional vectors have become increasingly prevalent. For example, these outcomes could be week-long time-series data from wearable devices or high-dimensional neuroimaging data, such as from functional magnetic resonance imaging. This paper focuses on randomized treatment studies with such high-dimensional outcomes characterized by sparse treatment effects, where interventions may influence a small number of dimensions, e.g., small temporal windows or specific brain regions. Conventional practices, such as using fixed, low-dimensional summaries of the outcomes, result in significantly reduced power for detecting treatment effects. To address this limitation, we propose a procedure that involves subset selection followed by inference. Specifically, given a potentially large set of outcome summaries, we identify the subset that captures treatment effects, which requires only one call to the Lasso, and subsequently conduct inference on the selected subset. Via theoretical analysis as well as simulations, we demonstrate that our method asymptotically selects the correct subset and increases statistical power.
翻译:暂无翻译