Rank-based input normalization is a workhorse of modern machine learning, prized for its robustness to scale, monotone transformations, and batch-to-batch variation. In many real systems, the ordering of feature values matters far more than their raw magnitudes - yet the structural conditions that a rank-based normalization operator must satisfy to remain stable under these invariances have never been formally pinned down. We show that widely used differentiable sorting and ranking operators fundamentally fail these criteria. Because they rely on value gaps and batch-level pairwise interactions, they are intrinsically unstable under strictly monotone transformations, shifts in mini-batch composition, and even tiny input perturbations. Crucially, these failures stem from the operators' structural design, not from incidental implementation choices. To address this, we propose three axioms that formalize the minimal invariance and stability properties required of rank-based input normalization. We prove that any operator satisfying these axioms must factor into (i) a feature-wise rank representation and (ii) a scalarization map that is both monotone and Lipschitz-continuous. We then construct a minimal operator that meets these criteria and empirically show that the resulting constraints are non-trivial in realistic setups. Together, our results sharply delineate the design space of valid rank-based normalization operators and formally separate them from existing continuous-relaxation-based sorting methods.
翻译:基于排序的输入归一化是现代机器学习的重要工具,因其对尺度、单调变换以及批次间变化的鲁棒性而备受推崇。在许多实际系统中,特征值的排序远比其原始幅度更为重要——然而,基于排序的归一化算子为在这些不变性下保持稳定所必须满足的结构性条件,却从未被正式确定。我们证明,广泛使用的可微排序与排名算子从根本上无法满足这些标准。由于它们依赖于数值间隙和批次级别的成对交互,这些算子在严格单调变换、小批量组成变化乃至微小输入扰动下本质上是非稳定的。关键在于,这些失效源于算子的结构设计,而非偶然的实现选择。为解决此问题,我们提出了三条公理,形式化地规定了基于排序的输入归一化所需的最小不变性与稳定性性质。我们证明,任何满足这些公理的算子必须分解为(i)一个特征维度的排序表示,以及(ii)一个既单调又Lipschitz连续的标量化映射。随后,我们构建了一个满足这些标准的最小算子,并通过实验证明,在实际设置中,由此产生的约束条件是非平凡的。综合而言,我们的结果清晰地界定了有效的基于排序的归一化算子的设计空间,并将其与现有的基于连续松弛的排序方法进行了形式上的区分。