High-throughput and quantitative experimental technologies are experiencing rapid advances in the biological sciences. One important recent technique is multiplexed fluorescence in situ hybridization (mFISH), which enables the identification and localization of large numbers of individual strands of RNA within single cells. Core to that technology is a coding problem: with each RNA sequence of interest being a codeword, how to design a codebook of probes, and how to decode the resulting noisy measurements? Published work has relied on assumptions of uniformly distributed codewords and binary symmetric channels for decoding and to a lesser degree for code construction. Here we establish that both of these assumptions are inappropriate in the context of mFISH experiments and substantial decoding performance gains can be obtained by using more appropriate, less classical, assumptions. We propose a more appropriate asymmetric channel model that can be readily parameterized from data and use it to develop a maximum a posteriori (MAP) decoders. We show that false discovery rate for rare RNAs, which is the key experimental metric, is vastly improved with MAP decoders even when employed with the existing sub-optimal codebook. Using an evolutionary optimization methodology, we further show that by permuting the codebook to better align with the prior, which is an experimentally straightforward procedure, significant further improvements are possible.
翻译:高通量和定量实验技术在生物科学方面正在经历快速的进步。 最新的一个重要技术是现场混合化的多轴荧光谱(mFISH),它使得在单细胞中能够识别和定位大量的RNA个体部分。 技术的核心是一个编码问题:每个RNA利益序列都是一个编码词,如何设计探测的编码簿,以及如何解码由此产生的噪音测量? 出版的工作依靠的是统一分布的编码词和二进制对称渠道的假设,用于解码,而代码构建的程度则较低。 我们在这里确定,这两种假设在MFISH实验中是不合适的,而且通过使用更适当的、不那么古典的假设可以实现大量解码绩效收益的。 我们建议一个更合适的不对称通道模型,可以很容易地根据数据进行参数参数比较,并用来开发一个最大的后传(MAP)解码解码。 我们表明,稀有的RNAs(这是关键的实验指标)的虚假发现率大大改进,即使MAP解码在使用这种模型时,也大大改进了。 使用一种更直观的细化方法,我们用一个可能的细化方法来进一步展示了一种可能的细化方法。