The continuous growth of data complexity requires methods and models that adequately account for non-trivial structures, as any simplification may induce loss of information. Many analytical tools have been introduced to work with complex data objects in their original form, but such tools can typically deal with single-type variables only. In this work, we propose Energy Trees as a model for regression and classification tasks where covariates are potentially both structured and of different types. Energy Trees incorporate Energy Statistics to generalize Conditional Trees, from which they inherit statistically sound foundations, interpretability, scale invariance, and lack of distributional assumptions. We focus on functions and graphs as structured covariates and we show how the model can be easily adapted to work with almost any other type of variable. Through an extensive simulation study, we highlight the good performance of our proposal in terms of variable selection and robustness to overfitting. Finally, we validate the model's predictive ability through two empirical analyses with human biological data.
翻译:数据复杂性的不断增长需要能够充分说明非三元结构的方法和模型,因为任何简化都可能导致信息丢失。许多分析工具已被引入与原始形式的复杂数据对象一起工作,但这类工具通常只能处理单一类型的变量。在这项工作中,我们提议能源树作为回归和分类任务的模型,在这种模式中,共变体既可能结构化,也可能是不同类型的。能源树包含能源统计,以普及有条件树,从中继承统计健全的基础、可解释性、规模变异性以及缺乏分配假设。我们注重功能和图表作为结构化的共变体,我们展示该模型如何能够很容易地适应几乎所有其他变量。通过广泛的模拟研究,我们强调我们的建议在变量选择和强健性方面的良好表现。最后,我们通过与人类生物数据进行两次经验分析来验证模型的预测能力。