Machine unlearning seeks to efficiently remove the influence of selected data while preserving generalization. Significant progress has been made in low dimensions $(p \ll n)$, but high dimensions pose serious theoretical challenges as standard optimization assumptions of $\Omega(1)$ strong convexity and $O(1)$ smoothness of the per-example loss $f$ rarely hold simultaneously in proportional regimes $(p\sim n)$. In this work, we introduce $\varepsilon$-Gaussian certifiability, a canonical and robust notion well-suited to high-dimensional regimes, that optimally captures a broad class of noise adding mechanisms. Then we theoretically analyze the performance of a widely used unlearning algorithm based on one step of the Newton method in the high-dimensional setting described above. Our analysis shows that a single Newton step, followed by a well-calibrated Gaussian noise, is sufficient to achieve both privacy and accuracy in this setting. This result stands in sharp contrast to the only prior work that analyzes machine unlearning in high dimensions \citet{zou2025certified}, which relaxes some of the standard optimization assumptions for high-dimensional applicability, but operates under the notion of $\varepsilon$-certifiability. That work concludes %that a single Newton step is insufficient even for removing a single data point, and that at least two steps are required to ensure both privacy and accuracy. Our result leads us to conclude that the discrepancy in the number of steps arises because of the sub optimality of the notion of $\varepsilon$-certifiability and its incompatibility with noise adding mechanisms, which $\varepsilon$-Gaussian certifiability is able to overcome optimally.
翻译:机器遗忘旨在高效移除选定数据的影响,同时保持泛化性能。在低维情形($p \ll n$)已取得显著进展,但高维情形提出了严峻的理论挑战,因为逐样本损失函数 $f$ 的 $\Omega(1)$ 强凸性和 $O(1)$ 光滑性这两个标准优化假设在比例区域($p\sim n$)中很少同时成立。本文提出 $\varepsilon$-高斯可认证性这一规范且鲁棒的概念,其特别适用于高维区域,并能最优地刻画一大类噪声添加机制。随后,我们在上述高维设定下,对基于牛顿法单步迭代的常用遗忘算法进行了理论分析。分析表明,在该设定中,单步牛顿迭代后辅以经适当校准的高斯噪声,即足以同时实现隐私性与准确性。这一结论与先前唯一分析高维机器遗忘的研究 \citet{zou2025certified} 形成鲜明对比,该研究虽然放宽了部分标准优化假设以适应高维场景,但基于 $\varepsilon$-可认证性概念进行分析,其结论是:即使仅移除单个数据点,单步牛顿迭代亦不足以保证隐私与精度,至少需要两步迭代才能确保两者。我们的结果表明,迭代步数差异源于 $\varepsilon$-可认证性概念的次优性及其与噪声添加机制的不兼容性,而 $\varepsilon$-高斯可认证性能够最优地克服这一局限。