Traditional methods for spatial inference estimate smooth interpolating fields based on features measured at well-located points. When the spatial locations of some observations are missing, joint inference of the fields and locations is possible as the fields inform the locations and vice versa. If the number of missing locations is large, conventional Bayesian Inference fails if the generative model for the data is even slightly mis-specified, due to feedback between estimated fields and the imputed locations. Semi-Modular Inference (SMI) offers a solution by controlling the feedback between different modular components of the joint model using a hyper-parameter called the influence parameter. Our work is motivated by linguistic studies on a large corpus of late-medieval English textual dialects. We simultaneously learn dialect fields using dialect features observed in ``anchor texts'' with known location and estimate the location of origin for ``floating'' textual dialects of unknown origin. The optimal influence parameter minimises a loss measuring the accuracy of held-out anchor data. We compute a (flow-based) variational approximation to the SMI posterior for our model. This allows efficient computation of the optimal influence. MCMC-based approaches, feasible on small subsets of the data, are used to check the variational approximation.
翻译:暂无翻译