The information content of symbolic sequences (such as nucleic- or amino acid sequences, but also neuronal firings or strings of letters) can be calculated from an ensemble of such sequences, but because information cannot be assigned to single sequences, we cannot correlate information to other observables attached to the sequence. Here we show that an information score obtained from multivariate (multiple-variable) correlations within sequences of a "training" ensemble can be used to predict observables of out-of-sample sequences with an accuracy that scales with the complexity of correlations, showing that functional information emerges from a hierarchy of multi-variable correlations.
翻译:符号序列的信息内容(如核酸或氨基酸序列,但也包括神经性发射或字母字符串)可以从这些序列的组合中计算出来,但是由于信息不能分配到单一序列,我们不能将信息与序列所附的其他观测结果联系起来。这里我们显示,从“训练”组合序列中多变量(多变量)相关关系中获得的信息分数可用于预测可观察到的外序列,准确度与关联的复杂程度相匹配,表明功能信息来自多变量关联的等级。