Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.
翻译:聚合体是日常生活的重要组成部分。它们的化学宇宙是如此之大,在确定适合的具体应用候选人方面既提供了前所未有的机会,也提出了重大挑战。我们提出了一个完整的端对端机器驱动的聚合信息管道,可以以前所未有的速度和精确度搜索合适的候选人空间。这个管道包括一种称为聚苯乙酸聚合化学指纹识别能力(受自然语言处理概念的启发),以及一种将聚生物反应小组指纹映射成一系列特性的多任务学习方法。聚生物反应小组是一个化学语言学家,将聚合物的化学结构作为化学语言对待。目前的方法超越了基于手工制作的指纹计划的现有最佳聚合物财产预测概念,以两个数量级的速度快速搜索,同时保持准确性,从而成为在包括云层基础设施在内的可缩放结构中部署的强大候选者。