Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP), which transport protons across lipid bilayers selectively and rapidly like natural proton channels. We utilized the probabilistic graphical model framework and developed a generalized hidden semi-Markov model (GHSMM-RHP) to extract the function-determining sequence features, including the transmembrane segments within a chain and the sequence heterogeneity among different chains. We developed stochastic variational methods for efficient inference on parameter estimation and predictions, and empirically studied their computational performance from a comparative perspective on Bayesian (i.e., stochastic variational Bayes) versus frequentist (i.e., stochastic variational expectation-maximization) frameworks that have been studied separately before. The real data results agree well with the laboratory experiments, and suggest GHSMM-RHP's potential in predicting protein-like behavior at the polymer-chain level.
翻译:近年来,在利用合成聚合物开发生物功能材料方面取得了长足进步,大多数生物材料的难测序列-功能关系问题日益严重,促使研究人员寻找更有效的工具和分析方法,在这项研究中,统计模型用于研究最近报告的随机异质聚合物(RHP)的序列特征,这些异质聚合物有选择地、迅速地将质子传送到脂性双层之间,与天然质子渠道相似。我们利用概率图形模型框架,开发了一个普遍隐蔽的半马尔科夫模型(GHMSMM-RHP),以提取功能确定序列特征,包括链中转模组和不同链中的序列异质性。我们开发了随机变异方法,以便有效地推断参数估计和预测,并用实验方法从拜斯河(即沙变波波湾)相对的角度,与以前分别研究过的经常(即随机变异性预期-质化)框架相比,我们开发了一种通用的半随机变异性模型(GHMRMA-RMA-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MISMIS-MIS-MIS-MIS-MIS-ILA-MIS-MIS-MIS-MIS-MIS-MIS-ILVOL-ILA-MIS-MIS-MIS-MIS-MIS-S-S-S-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-I 的预测法,实际数据性能与MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-MIS-S-S-MIS-MIS-S-S-S-S-I-I-I-I-I-SDAR-SDMIS-MAR-SDMISMISMISMISMAR-MAR-MAR-SDMAR-I-I-S-S-I-I-I-S-S-S-I-I-I-I-S-MIS-S-S-S-S-S-MIS-S-S-S-I-I-I-