Non-parametric supervised learning algorithms represent a succinct class of supervised learning algorithms where the learning parameters are highly flexible and whose values are directly dependent on the size of the training data. In this paper, we comparatively study the properties of four nonparametric algorithms, K-Nearest Neighbours (KNNs), Support Vector Machines (SVMs), Decision trees and Random forests. The supervised learning task is a regression estimate of the time-lapse in medical insurance reimbursement. Our study is concerned precisely with how well each of the nonparametric regression models fits the training data. We quantify the goodness of fit using the R-squared metric. The results are presented with a focus on the effect of the size of the training data, the feature space dimension and hyperparameter optimization.
翻译:非参数监督的学习算法是一组简洁的受监督的学习算法,其学习参数高度灵活,其价值直接取决于培训数据的规模。在本文中,我们比较研究了四种非参数算法的特性:K-Nest邻居(KNN),支持矢量机(SVM),决策树和随机森林。受监督的学习任务是对医疗保险偿还时间的过错进行回归估计。我们的研究确切涉及每个非参数回归模型与培训数据相适应的程度。我们用R方形计量尺度量化适合的好坏。结果侧重于培训数据的规模、特征空间尺寸和超参数优化的影响。