Data analysis in science, e.g., high-energy particle physics, is often subject to an intractable likelihood if the observables and observations span a high-dimensional input space. Typically the problem is solved by reducing the dimensionality using feature engineering and histograms, whereby the latter technique allows to build the likelihood using Poisson statistics. However, in the presence of systematic uncertainties represented by nuisance parameters in the likelihood, the optimal dimensionality reduction with a minimal loss of information about the parameters of interest is not known. This work presents a novel strategy to construct the dimensionality reduction with neural networks for feature engineering and a differential formulation of histograms so that the full workflow can be optimized with the result of the statistical inference, e.g., the variance of a parameter of interest, as objective. We discuss how this approach results in an estimate of the parameters of interest that is close to optimal and the applicability of the technique is demonstrated with a simple example based on pseudo-experiments and a more complex example from high-energy particle physics.
翻译:科学数据分析,例如高能粒子物理学,如果观测和观测跨越一个高维输入空间,则往往难以确定数据分析的可能性; 问题通常是通过使用地貌工程和直方图降低维度来解决的,而后一种技术允许利用Poisson统计数据来建立可能性; 然而,在可能存在麻烦参数所代表的系统性不确定性的情况下,尚不知道最佳的维度减少,同时尽量减少有关参数的信息损失; 这项工作提出了一种新的战略,用地貌工程神经网络和直方图的差别配制来构建维度减少,以便根据统计推理结果,如利益参数的差异,将整个工作流程优化为客观目标; 我们讨论如何利用一个简单的例子,用假实验和高能粒子物理学的更复杂例子,来估计接近最佳的利益参数,并用一个简单的例子来证明技术的适用性。