采集阶段的相空间熵反映下游可学习性 (Phase-space entropy at acquisition reflects downstream learnability)

Modern learning systems work with data that vary widely across domains, but they all ultimately depend on how much structure is already present in the measurements before any model is trained. This raises a basic question: is there a general, modality-agnostic way to quantify how acquisition itself preserves or destroys the information that downstream learners could use? Here we propose an acquisition-level scalar $ΔS_{\mathcal B}$ based on instrument-resolved phase space. Unlike pixelwise distortion or purely spectral errors that often saturate under aggressive undersampling, $ΔS_{\mathcal B}$ directly quantifies how acquisition mixes or removes joint space--frequency structure at the instrument scale. We show theoretically that $ΔS_{\mathcal B}$ correctly identifies the phase-space coherence of periodic sampling as the physical source of aliasing, recovering classical sampling-theorem consequences. Empirically, across masked image classification, accelerated MRI, and massive MIMO (including over-the-air measurements), $|ΔS_{\mathcal B}|$ consistently ranks sampling geometries and predicts downstream reconstruction/recognition difficulty \emph{without training}. In particular, minimizing $|ΔS_{\mathcal B}|$ enables zero-training selection of variable-density MRI mask parameters that matches designs tuned by conventional pre-reconstruction criteria. These results suggest that phase-space entropy at acquisition reflects downstream learnability, enabling pre-training selection of candidate sampling policies and as a shared notion of information preservation across modalities.

翻译：现代学习系统处理的数据在不同领域间差异巨大，但它们最终都取决于在训练任何模型之前，测量数据中已存在多少结构。这引发了一个基本问题：是否存在一种通用的、与模态无关的方法，来量化采集过程本身如何保留或破坏下游学习者可利用的信息？本文提出一种基于仪器分辨相空间的采集层面标量 $ΔS_{\mathcal B}$。与通常在激进欠采样下饱和的逐像素失真或纯频谱误差不同，$ΔS_{\mathcal B}$ 直接量化了采集过程如何在仪器尺度上混合或移除联合空间-频率结构。我们从理论上证明，$ΔS_{\mathcal B}$ 正确识别了周期性采样的相空间相干性作为混叠的物理来源，并恢复了经典采样定理的结论。实证研究表明，在掩码图像分类、加速磁共振成像和大规模MIMO（包括空中测量）等任务中，$|ΔS_{\mathcal B}|$ 能够一致地对采样几何结构进行排序，并在无需训练的情况下预测下游重建/识别难度。特别地，最小化 $|ΔS_{\mathcal B}|$ 可实现无需训练的变密度磁共振成像掩码参数选择，其效果与传统基于预重建准则调优的设计相匹配。这些结果表明，采集阶段的相空间熵反映了下游可学习性，为跨模态的采样策略候选方案选择提供了预训练依据，并建立了信息保留的统一度量标准。