Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (fraction of features deemed relevant); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show how to encode both types of domain knowledge into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters that encodes knowledge about feature sparsity, and a Stein gradient optimization to tune the hyperparameters in such a way that the distribution induced on the model's PVE matches the prior distribution. We show empirically that the new prior improves prediction accuracy, compared to existing neural network priors, on several publicly available datasets and in a genetics application where signals are weak and sparse, often outperforming even computationally intensive cross-validation for hyperparameter tuning.
翻译:在神经网络的高维重量空间之前将域知识编码成先前的神经网络高维重量空间是一项艰巨的任务,但在数据有限和信号薄弱的应用中至关重要。在科学应用中,通常可以找到两种域知识:一是地貌宽度(破坏认为相关的特征);二是信号对噪音比例的量化,例如,差异的比例(PVE)。我们展示了如何将两种类型的域知识编码成广泛使用的高斯尺度混合物前期和自动相关性确定。具体地说,我们提议在本地(即特定地物)尺度参数上先采用一个新的联合参数,以编码关于地貌宽度的知识,并采用斯坦梯度优化法调节超常参数,使模型 PVE 上的分布与先前分布相匹配。我们从经验上表明,与现有的神经网络前期相比,在几个公开可用的数据集和遗传应用中,新的预测准确性得到了提高,在其中,信号较弱和稀弱,往往甚至比计算得超强的超强的超度交叉校准。