We consider nonparametric prediction with multiple covariates, in particular categorical or functional predictors, or a mixture of both. The method proposed bases on an extension of the Nadaraya-Watson estimator where a kernel function is applied on a linear combination of distance measures each calculated on single covariates, with weights being estimated from the training data. The dependent variable can be categorical (binary or multi-class) or continuous, thus we consider both classification and regression problems. The methodology presented is illustrated and evaluated on artificial and real world data. Particularly it is observed that prediction accuracy can be increased, and irrelevant, noise variables can be identified/removed by "downgrading" the corresponding distance measures in a completely data-driven way.
翻译:我们考虑的是具有多种共变数的非参数预测,特别是绝对或功能预测,或两者兼而有之。方法提议以Nadaraya-Watson估计仪的延伸为基础,其中内核函数应用在根据单项共变数计算出的距离测量线性组合上,加权数从培训数据中估算。依附变量可以是绝对的(双级或多级),也可以是连续的,因此我们考虑分类和回归问题。所提出的方法在人工和真实世界数据中加以说明和评价。人们特别注意到,预测的准确性可以提高,不相关的声音变量可以通过完全以数据驱动的方式“降低”相应的距离测量来识别/去除。