While there is a general focus on prediction of values, real data often only allows to predict conditional probability distributions, with capabilities bounded by conditional entropy $H(Y|X)$. If additionally estimating uncertainty, we can treat a predicted value as the center of Gaussian of Laplace distribution - idealization which can be far from complex conditional distributions of real data. This article applies Hierarchical Correlation Reconstruction (HCR) approach to inexpensively predict quite complex conditional probability distributions (e.g. multimodal): by independent MSE estimation of multiple moment-like parameters, which allow to reconstruct the conditional distribution. Using linear regression for this purpose, we get interpretable models: with coefficients describing contributions of features to conditional moments. This article extends on the original approach especially by using Canonical Correlation Analysis (CCA) for feature optimization and l1 "lasso" regularization, focusing on practical problem of prediction of redshift of Active Galactic Nuclei (AGN) based on Fourth Fermi-LAT Data Release 2 (4LAC) dataset.
翻译:虽然对数值的预测具有一般重点,但真实数据往往只能预测有条件概率分布,其能力受有条件的英特罗比(Y ⁇ X)美元约束。如果进一步估算不确定性,我们可以将预测值作为拉普尔分布的高西亚的中心,这个理想化可能远非真实数据复杂的有条件分布。本篇文章采用等级关系重建(HCR)方法,廉价地预测相当复杂的有条件概率分布(例如多式联运):通过独立的MSE估算多种时似参数,从而可以重建有条件分布。我们为此使用线性回归,我们得到可解释的模式:用系数描述特征对有条件时刻的贡献。本文章扩展了最初的方法,特别是利用Canonical Connorlation 分析(CCA) 进行特征优化和 l1“lasso”规范,侧重于预测基于第四发价-拉特数据发布2 (4LAC) 的主动加拉克(AGN) 数据集的变位的实际问题。