With the constant advancements of genetic engineering, a common concern is to be able to identify the lab-of-origin of genetically engineered DNA sequences. For that reason, AltLabs has hosted the genetic Engineering Attribution Challenge to gather many teams to propose new tools to solve this problem. Here we show our proposed method to rank the most likely labs-of-origin and generate embeddings for DNA sequences and labs. These embeddings can also perform various other tasks, like clustering both DNA sequences and labs and using them as features for Machine Learning models applied to solve other problems. This work demonstrates that our method outperforms the classic training method for this task while generating other helpful information.
翻译:随着基因工程的不断进步,一个共同的关切是能够查明基因工程DNA序列的原产实验室。 因此, AltLabs 主持基因工程归宿挑战, 聚集了许多团队来提出解决这一问题的新工具。 我们在这里展示了我们建议的方法, 来排列最可能的原产实验室, 并生成DNA序列和实验室的嵌入器。 这些嵌入器还可以执行其他各种任务, 比如将DNA序列和实验室进行组合, 并把它们用作机器学习模型的特征, 用于解决其它问题。 这项工作表明, 我们的方法在生成其他有用信息的同时, 超过了用于这项任务的经典培训方法 。