局部加权递减,使用不同的内核滑滑器进行软件努力估算 (Locally Weighted Regression with different Kernel Smoothers for Software Effort Estimation)

Estimating software effort has been a largely unsolved problem for decades. One of the main reasons that hinders building accurate estimation models is the often heterogeneous nature of software data with a complex structure. Typically, building effort estimation models from local data tends to be more accurate than using the entire data. Previous studies have focused on the use of clustering techniques and decision trees to generate local and coherent data that can help in building local prediction models. However, these approaches may fall short in some aspect due to limitations in finding optimal clusters and processing noisy data. In this paper we used a more sophisticated locality approach that can mitigate these shortcomings that is Locally Weighted Regression (LWR). This method provides an efficient solution to learn from local data by building an estimation model that combines multiple local regression models in k-nearest-neighbor based model. The main factor affecting the accuracy of this method is the choice of the kernel function used to derive the weights for local regression models. This paper investigates the effects of choosing different kernels on the performance of Locally Weighted Regression of a software effort estimation problem. After comprehensive experiments with 7 datasets, 10 kernels, 3 polynomial degrees and 4 bandwidth values with a total of 840 Locally Weighted Regression variants, we found that: 1) Uniform kernel functions cannot outperform non-uniform kernel functions, and 2) kernel type, polynomial degrees and bandwidth parameters have no specific effect on the estimation accuracy.

翻译：估计软件的努力在几十年中基本上是一个未解决的问题。妨碍建立准确估算模型的主要原因之一是软件数据具有复杂的结构, 其特性往往不一。通常, 从本地数据建立努力估算模型往往比使用全部数据更准确。以前的研究侧重于使用集群技术和决策树来生成有助于建立本地预测模型的本地和一致的数据。然而, 这些方法在某些方面可能不尽如人意, 原因是在寻找最佳集群和处理噪音数据方面存在限制。在本文中, 我们使用了更复杂的本地参数, 以缓解这些缺点, 即局部加权回归( LWR) 。这种方法提供了一个有效的解决方案, 通过构建一个将多个本地回归模型结合到 k- 近邻模型中的多个本地回归模型来学习本地数据。影响该方法准确性的主要因素是选择用来计算本地回归模型加权值的内核函数。本文研究了选择不同的内核内核对于本地加权递增40 软件努力度( LWRWR) 度( LWRW) 准确度( LWR) 准确度(LWER) ) 的精确度(LWE) 。这个方法提供了一个有效的解决方案,, 通过构建一个有效的方法来从本地数据模型来从本地数据模型中学习 4 的模型中得出总值。