With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce "STI-KNN", an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n^2) time, which is a significant improvement over the O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.
翻译:----
随着数据可用性和使用方式的快速增长,量化每个训练数据点的增加值变得至关重要。Shapley值已被认为是一种有效的数据估值方法,使得能够进行高效的训练集概括、获取和异常值去除。在本文中,我们引入“STI-KNN”,一种创新算法,可以在O(t n^2)的时间内计算KNN模型的精确的成对交互Shapley值,这是一个相对于基线方法O(2^n)时间复杂度的重大改进。通过使用STI-KNN,我们能够高效准确地评估个体数据点的价值,从而改善训练结果,进而提高人工智能应用的效果。