In many areas of the observational and experimental sciences data is scarce. Data observation in high-energy astrophysics is disrupted by celestial occlusions and limited telescope time while data derived from laboratory experiments in synthetic chemistry and materials science is time and cost-intensive to collect. On the other hand, knowledge about the data-generation mechanism is often available in the sciences, such as the measurement error of a piece of laboratory apparatus. Both characteristics, small data and knowledge of the underlying physics, make Gaussian processes (GPs) ideal candidates for fitting such datasets. GPs can make predictions with consideration of uncertainty, for example in the virtual screening of molecules and materials, and can also make inferences about incomplete data such as the latent emission signature from a black hole accretion disc. Furthermore, GPs are currently the workhorse model for Bayesian optimisation, a methodology foreseen to be a guide for laboratory experiments in scientific discovery campaigns. The first contribution of this thesis is to use GP modelling to reason about the latent emission signature from the Seyfert galaxy Markarian 335, and by extension, to reason about the applicability of various theoretical models of black hole accretion discs. The second contribution is to extend the GP framework to molecular and chemical reaction representations and to provide an open-source software library to enable the framework to be used by scientists. The third contribution is to leverage GPs to discover novel and performant photoswitch molecules. The fourth contribution is to introduce a Bayesian optimisation scheme capable of modelling aleatoric uncertainty to facilitate the identification of material compositions that possess intrinsic robustness to large scale fabrication processes.
翻译:在许多观测和实验科学领域中,数据往往很少。高能天体物理学中的数据观测受到天体遮挡和有限的望远镜时间的影响,而从合成化学和材料科学实验中获得的数据则需要耗费大量时间和成本。另一方面,在科学研究中通常可以获取有关数据生成机制的知识,例如实验设备的测量误差。这两种特征(小数据和对底层物理的了解)使得高斯过程(GP)成为拟合此类数据的理想候选方法。GP可以考虑不确定性进行预测,例如在分子和材料的虚拟筛选中,还可以对不完整的数据进行推断,例如黑洞吸积盘的潜在发射特征。此外,GP目前是贝叶斯优化的核心模型,这是一种可预见将成为科学研究中的实验指南方法。本论文的第一个贡献是使用高斯过程建模来推断色相不稳定型活动星系Markarian 335(Seyfert星系)的潜在发射特征,从而推断各种黑洞吸积盘理论模型的适用性。第二个贡献是扩展GP框架到分子和化学反应表示,并提供一个开源软件库,使科学家能够使用该框架。第三个贡献是利用GP发现新颖且性能优异的光开关分子。第四个贡献是引入一个贝叶斯优化方案,能够建模细致误差,以便识别具有大规模制造过程内在稳健性的材料组成。