Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.
翻译:影响功能和近似最大影响扰动等影响诊断在机器学习和AI领域应用中很受欢迎。影响诊断是确定有影响的数据点或数据点子集的有力统计工具。我们通过高效的逆向赫斯-矢量产品实施,建立了影响功能和近似最大影响扰动的有限抽样统计界限以及计算复杂性界限。我们用通用线性模型和大量关注的合成和真实数据模型来说明我们的结果。