Multidimensional data analysis has become increasingly important in many fields, mainly due to current vast data availability and the increasing demand to extract knowledge from it. In most applications, the role of the final user is crucial to build proper machine learning models and to explain the patterns found in data. In this paper, we present an open unified approach for generating, evaluating, and applying regression models in high-dimensional data sets within a user-guided process. The approach is based on exposing a broad correlation panorama for attributes, by which the user can select relevant attributes to build and evaluate prediction models for one or more contexts. We name the approach UCReg (User-Centered Regression). We demonstrate effectiveness and efficiency of UCReg through the application of our framework to the analysis of Covid-19 and other synthetic and real health records data.
翻译:多层面数据分析在许多领域变得日益重要,这主要是因为目前有大量数据可用,而且从中获取知识的需求不断增加。在大多数应用中,最终用户的作用对于建立适当的机器学习模型和解释数据中发现的模式至关重要。在本文件中,我们提出了一个在用户指导下的进程中生成、评价和应用高维数据集回归模型的开放统一办法。这种方法基于揭示属性的广泛相关性全景,用户可以通过该全景选择相关属性,为一个或多个背景建立和评估预测模型。我们点出UCREG(用户-以内回归)方法。我们通过应用我们框架分析Covid-19和其他合成和真实健康记录数据,展示了UCREG的效力和效率。