建立克性信任和共性信任之间信任的经验性贝耶办法 (An Empirical Bayes Approach for Constructing the Confidence Intervals of Clonality and Entropy)

This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the well-known PCR process of successive heating and cooling, producing a large number of reads. They number roughly 30,000 to 300,000. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. In the end, the observation consists of a set of numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements present in the blood. Statistical question is to make inferences on a summary parameter of the probability vector based on a single multinomial-type observation of a large dimension. Popular summary of the diversity of a cell population includes clonality and entropy, or more generally, is a suitable function of the probability vector. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. After obtaining a point estimator of a particular function, the remaining challenge is to construct a confidence interval of the parameter to appropriately reflect its uncertainty. In this paper, we have proposed to couple the empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method has been illustrated via extensive numerical study and real data examples.

翻译：本文的动因是需要量化人类免疫对环境挑战的反应。具体地说, 血样中选定细胞群的基因组可以通过众所周知的连续加热和冷却的多光谱化过程放大, 产生大量读数。大约30,000至300,000。每读对应所谓的V(D)J序列的特定重新排列。最终, 观察由一系列与不同的V(D)J序列相对应的读数组成。不同的V(D)J序列的相对频率可以通过一个概率矢量来概括, 其基数是血液中独特的V(D)J重新排列的数。统计问题是根据对大维的单个多光谱观测对概率矢量的简要参数作出推断。细胞群的多样性的大众摘要包括凝固度和摄像, 或更一般地说, 是概率矢量的一个适当函数。基于同一血样的多个复制量的点估计, 其基数是血液样本中不同的V(D)J重新排列的数。在获得一个精确度的精确度模型后, 我们提出一个精确度的精确度的精确度比数的精确度模型, 的精确度的精确度的精确度的精确度的计算程序是我们提出的一个方法的精确度的精确度, 。。的精确度的精确度的精确度的精确度的精确度的精确度的精确度的精确度的精确度的精确度的精确度, 。