论可容许的基于排序的输入归一化算子 (On Admissible Rank-based Input Normalization Operators)

Rank-based input normalization is a workhorse of modern machine learning, prized for its robustness to scale, monotone transformations, and batch-to-batch variation. In many real systems, the ordering of feature values matters far more than their raw magnitudes - yet the structural conditions that a rank-based normalization operator must satisfy to remain stable under these invariances have never been formally pinned down. We show that widely used differentiable sorting and ranking operators fundamentally fail these criteria. Because they rely on value gaps and batch-level pairwise interactions, they are intrinsically unstable under strictly monotone transformations, shifts in mini-batch composition, and even tiny input perturbations. Crucially, these failures stem from the operators' structural design, not from incidental implementation choices. To address this, we propose three axioms that formalize the minimal invariance and stability properties required of rank-based input normalization. We prove that any operator satisfying these axioms must factor into (i) a feature-wise rank representation and (ii) a scalarization map that is both monotone and Lipschitz-continuous. We then construct a minimal operator that meets these criteria and empirically show that the resulting constraints are non-trivial in realistic setups. Together, our results sharply delineate the design space of valid rank-based normalization operators and formally separate them from existing continuous-relaxation-based sorting methods.

翻译：基于排序的输入归一化是现代机器学习的重要工具，因其对尺度、单调变换以及批次间变化的鲁棒性而备受推崇。在许多实际系统中，特征值的排序远比其原始幅度更为重要——然而，基于排序的归一化算子为在这些不变性下保持稳定所必须满足的结构性条件，却从未被正式确定。我们证明，广泛使用的可微排序与排名算子从根本上无法满足这些标准。由于它们依赖于数值间隙和批次级别的成对交互，这些算子在严格单调变换、小批量组成变化乃至微小输入扰动下本质上是非稳定的。关键在于，这些失效源于算子的结构设计，而非偶然的实现选择。为解决此问题，我们提出了三条公理，形式化地规定了基于排序的输入归一化所需的最小不变性与稳定性性质。我们证明，任何满足这些公理的算子必须分解为（i）一个特征维度的排序表示，以及（ii）一个既单调又Lipschitz连续的标量化映射。随后，我们构建了一个满足这些标准的最小算子，并通过实验证明，在实际设置中，由此产生的约束条件是非平凡的。综合而言，我们的结果清晰地界定了有效的基于排序的归一化算子的设计空间，并将其与现有的基于连续松弛的排序方法进行了形式上的区分。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

专知会员服务

12+阅读 · 7月28日

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 6月6日

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日

NeurIPS 2021 | 寻找用于变分布泛化的隐式因果因子

专知会员服务

17+阅读 · 2021年12月7日