How information is encoded in bio-molecular sequences is difficult to quantify since such an analysis usually requires sampling an exponentially large genetic space. Here we show how information theory reveals both robust and compressed encodings in the largest complete genotype-phenotype map (over 5 trillion sequences) obtained to date.
翻译:生物分子序列中如何编码信息难以量化,因为这种分析通常需要取样一个指数性的大型遗传空间。 我们在这里展示信息理论如何在迄今获得的最大完整的基因类型-苯型地图(超过5万亿个序列)中显示强固和压缩的编码。