Artificial intelligence is applied in a range of sectors, and is relied upon for decisions requiring a high level of trust. For regression methods, trust is increased if they approximate the true input-output relationships and perform accurately outside the bounds of the training data. But often performance off-test-set is poor, especially when data is sparse. This is because the conditional average, which in many scenarios is a good approximation of the `ground truth', is only modelled with conventional Minkowski-r error measures when the data set adheres to restrictive assumptions, with many real data sets violating these. To combat this there are several methods that use prior knowledge to approximate the `ground truth'. However, prior knowledge is not always available, and this paper investigates how error measures affect the ability for a regression method to model the `ground truth' in these scenarios. Current error measures are shown to create an unhelpful bias and a new error measure is derived which does not exhibit this behaviour. This is tested on 36 representative data sets with different characteristics, showing that it is more consistent in determining the `ground truth' and in giving improved predictions in regions beyond the range of the training data.
翻译:人工智能在一系列部门应用,并用于需要高度信任的决定。 对于回归方法,如果回归方法,如果它们接近真正的投入-产出关系,并在培训数据范围之外准确执行,信任就会增加。但通常测试外的性能较差,特别是当数据稀少时。这是因为有条件的平均值在许多假设中是“地面真相”的良好近似,只是以传统的Minkowski-r错误计量标准为模型,而当数据集坚持限制性假设时,而许多真实数据集则违反这些假设。要解决这一问题,有好几种方法使用先前的知识来接近“地面真相”。然而,以前的知识并不总是可用,本文调查错误计量措施如何影响回归方法模拟这些假设中的“地面真相”的能力。目前的错误计量措施显示造成了无益的偏差,而新的错误计量则显示不表现出这种行为。这是根据具有不同特征的36套具有代表性的数据集进行的测试,显示在确定“地面真相”和在培训范围以外区域作出更好的预测方面更加一致。