Gene expression estimation from pathology images has the potential to reduce the RNA sequencing cost. Point-wise loss functions have been widely used to minimize the discrepancy between predicted and absolute gene expression values. However, due to the complexity of the sequencing techniques and intrinsic variability across cells, the observed gene expression contains stochastic noise and batch effects, and estimating the absolute expression values accurately remains a significant challenge. To mitigate this, we propose a novel objective of learning relative expression patterns rather than absolute levels. We assume that the relative expression levels of genes exhibit consistent patterns across independent experiments, even when absolute expression values are affected by batch effects and stochastic noise in tissue samples. Based on the assumption, we model the relation and propose a novel loss function called STRank that is robust to noise and batch effects. Experiments using synthetic datasets and real datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/naivete5656/STRank.
翻译:基于病理学图像估计基因表达具有降低RNA测序成本的潜力。点对点损失函数已被广泛用于最小化预测值与绝对基因表达值之间的差异。然而,由于测序技术的复杂性以及细胞间的内在变异性,观测到的基因表达包含随机噪声和批次效应,准确估计绝对表达值仍面临重大挑战。为缓解此问题,我们提出了一种学习相对表达模式而非绝对水平的新目标。我们假设基因的相对表达水平在独立实验中呈现一致模式,即使绝对表达值受组织样本中批次效应和随机噪声的影响。基于此假设,我们建模其关系并提出一种名为STRank的新型损失函数,该函数对噪声和批次效应具有鲁棒性。使用合成数据集和真实数据集的实验验证了所提方法的有效性。代码可在https://github.com/naivete5656/STRank获取。