The ability to identify the designer of engineered biological sequences -- termed genetic engineering attribution (GEA) -- would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.
翻译:确定工程生物序列设计者的能力 -- -- 称为遗传工程属性(GEA) -- -- 将有助于确保生物技术创新的适当信用,同时要求设计者对其所影响的社区负责。在这里,我们介绍了第一次遗传工程归属挑战的成果,这是推进GEA的公共数据科学竞赛。顶尖的分级团队在确定工程序列的真正原创实验室方面比以往的模型高得多,包括上层-1和上层-10级精确度提高10个百分点。一个简单的获奖模型组合进一步提高了绩效。新的指标旨在评估模型是否有能力自信地排除候选实验室,也显示出重大改进,特别是对于共同点。大多数获奖团队采用了CNN的机器学习方法;然而,一个团队以极快的神经网络无线方式实现了非常高的精准性;未来的工作,包括未来的竞争,应该进一步探索使GEA技术实际应用的广泛方法。