RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of Automatic Differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this paper, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters and are thus an interesting first target for AD support in RooFit.
翻译:RooFit是用于统计建模和拟合的工具包,被粒子物理中大多数实验使用。随着下一代实验数据集的增长,物理分析的处理要求变得越来越具有计算要求,需要对RooFit进行性能优化。其中一种可能的加速最小化和增加稳定性的方法是使用自动微分(AD)。与数值微分不同,计算成本随参数数量线性缩放,使AD特别适用于具有许多参数的统计模型。在本文中,我们报告了在RooFit中实现AD的一种可能方法。我们的方法是添加一个自动生成C++代码的工具,用于完整的RooFit模型。与原始的RooFit模型不同,此生成代码不包含虚函数调用和其他RooFit特定的开销。特别地,该代码然后用于使用Clad自动产生梯度。Clad是作为clang编译器插件实现的源变换AD工具,可为输入的C++函数自动生成导数代码。我们展示了将此代码生成策略应用于HistFactory和其他常用的RooFit模型时观察到的改进结果。HistFactory是RooFit的一个子组件,它实现了基于直方图模板的概率密度的柱形似然模型。这些模型通常具有非常多的自由参数,因此是RooFit中首要支持AD的有趣目标。