Most of statistics and AI draw insights through modelling discord or variance between sources of information (i.e., inter-source uncertainty). Increasingly, however, research is focusing upon uncertainty arising at the level of individual measurements (i.e., within- or intra-source), such as for a given sensor output or human response. Here, adopting intervals rather than numbers as the fundamental data-type provides an efficient, powerful, yet challenging way forward -- offering systematic capture of uncertainty-at-source, increasing informational capacity, and ultimately potential for insight. Following recent progress in the capture of interval-valued data, including from human participants, conducting machine learning directly upon intervals is a crucial next step. This paper focuses on linear regression for interval-valued data as a recent growth area, providing an essential foundation for broader use of intervals in AI. We conduct an in-depth analysis of state-of-the-art methods, elucidating their behaviour, advantages, and pitfalls when applied to datasets with different properties. Specific emphasis is given to the challenge of preserving mathematical coherence -- i.e., ensuring that models maintain fundamental mathematical properties of intervals throughout -- and the paper puts forward extensions to an existing approach to guarantee this. Carefully designed experiments, using both synthetic and real-world data, are conducted -- with findings presented alongside novel visualizations for interval-valued regression outputs, designed to maximise model interpretability. Finally, the paper makes recommendations concerning method suitability for data sets with specific properties and highlights remaining challenges and important next steps for developing AI with the capacity to handle uncertainty-at-source.
翻译:大部分统计数据和大赦国际都通过建模不同或不同信息来源(即来源间不确定性)的建模来获得洞察力。然而,研究越来越侧重于个体测量(即源内或源内)层面产生的不确定性,例如特定传感器输出或人类反应。在这里,采用间隔而不是数字,因为基本数据类型提供了一种高效、有力、但富有挑战性的前进道路 -- -- 系统地捕捉来源不确定因素,增加信息能力,最终有洞察潜力。随着近期在获取包括来自人类参与者的数据在内的定期数据方面取得的进展,直接进行机器学习是一个关键的下一步。本文侧重于作为近期增长领域,定期估值数据的线性回归,为在AI中更广泛地使用间隔提供了基础基础基础基础基础。我们深入分析了基本数据类型方法,揭示了它们的行为、优势和隐患。 具体强调维护数学一致性的挑战,即确保模型保持整个周期的基本数学特性,这是下一个关键的步骤。 本文侧重于期定数据值数据回归的直线回归率,最终将设计出一个精确的直观数据分析方法,以推进当前估算结果。</s>