基于随机傅里叶特征信息化的聚类广义可加模型 (Cluster-Based Generalized Additive Models Informed by Random Fourier Features)

Explainable machine learning aims to strike a balance between prediction accuracy and model transparency, particularly in settings where black-box predictive models, such as deep neural networks or kernel-based methods, achieve strong empirical performance but remain difficult to interpret. This work introduces a mixture of generalized additive models (GAMs) in which random Fourier feature (RFF) representations are leveraged to uncover locally adaptive structure in the data. In the proposed method, an RFF-based embedding is first learned and then compressed via principal component analysis. The resulting low-dimensional representations are used to perform soft clustering of the data through a Gaussian mixture model. These cluster assignments are then applied to construct a mixture-of-GAMs framework, where each local GAM captures nonlinear effects through interpretable univariate smooth functions. Numerical experiments on real-world regression benchmarks, including the California Housing, NASA Airfoil Self-Noise, and Bike Sharing datasets, demonstrate improved predictive performance relative to classical interpretable models. Overall, this construction provides a principled approach for integrating representation learning with transparent statistical modeling.

翻译：可解释机器学习旨在预测准确性与模型透明度之间取得平衡，尤其是在黑盒预测模型（如深度神经网络或基于核的方法）虽能取得较强的实证性能却难以解释的场景中。本研究提出了一种广义可加模型（GAMs）的混合模型，其中利用随机傅里叶特征（RFF）表示来揭示数据中局部自适应的结构。在所提方法中，首先学习基于RFF的嵌入表示，随后通过主成分分析进行压缩。所得的低维表示用于通过高斯混合模型对数据进行软聚类。这些聚类分配随后被用于构建混合GAMs框架，其中每个局部GAM通过可解释的单变量平滑函数捕捉非线性效应。在包括加州房价、NASA翼型自噪声和共享单车数据集在内的真实世界回归基准上的数值实验表明，相对于经典可解释模型，所提方法取得了更优的预测性能。总体而言，该构建为将表示学习与透明统计建模相结合提供了一种原则性方法。