Approximate Bayesian Computation is widely used in systems biology for inferring parameters in stochastic gene regulatory network models. Its performance hinges critically on the ability to summarize high-dimensional system responses such as time series into a few informative, low-dimensional summary statistics. The quality of those statistics acutely impacts the accuracy of the inference task. Existing methods to select the best subset out of a pool of candidate statistics do not scale well with large pools of several tens to hundreds of candidate statistics. Since high quality statistics are imperative for good performance, this becomes a serious bottleneck when performing inference on complex and high-dimensional problems. This paper proposes a convolutional neural network architecture for automatically learning informative summary statistics of temporal responses. We show that the proposed network can effectively circumvent the statistics selection problem of the preprocessing step for ABC inference. The proposed approach is demonstrated on two benchmark problem and one challenging inference problem learning parameters in a high-dimensional stochastic genetic oscillator. We also study the impact of experimental design on network performance by comparing different data richness and data acquisition strategies.
翻译:Bayesian Computation在系统生物学中广泛使用,用于在随机基因监管网络模型中推算参数,其性能关键取决于能否将高维系统反应(如时间序列)归纳成一些资料丰富、低维的简要统计数据。这些统计数据的质量严重影响了推论任务的准确性。从候选统计数据库中挑选最佳子集的现有方法规模不及数万至数百个候选统计数据的庞大集合。由于高质量的统计数据对于良好性能至关重要,因此在对复杂和高维的问题进行推论时,将成为一个严重的瓶颈。本文提出一个动态神经网络结构,用于自动学习关于时间反应的信息丰富的简要统计数据。我们表明,拟议的网络可以有效地绕过ABC推论预处理步骤的统计选择问题。拟议方法在两个基准问题上展示了,一个在高度随机基因振荡器中具有挑战性的问题学习参数。我们还通过比较不同的数据丰富性和数据获取战略,研究实验设计对网络性能的影响。