High-throughput Genomics is ushering a new era in personalized health care, and targeted drug design and delivery. Mining these large datasets, and obtaining calibrated predictions is of immediate relevance and utility. In our work, we develop methods for Gene Expression Inference based on Deep neural networks. However, unlike typical Deep learning methods, our inferential technique, while achieving state-of-the-art performance in terms of accuracy, can also provide explanations, and report uncertainty estimates. We adopt the Quantile Regression framework to predict full conditional quantiles for a given set of house keeping gene expressions. Conditional quantiles, in addition to being useful in providing rich interpretations of the predictions, are also robust to measurement noise. However, check loss, used in quantile regression to drive the estimation process is not differentiable. We propose log-cosh as a smooth-alternative to the check loss. We apply our methods on GEO microarray dataset. We also extend the method to binary classification setting. Furthermore, we investigate other consequences of the smoothness of the loss in faster convergence.
翻译:高通量基因组正在开创个性化医疗保健和定向药物设计和交付的新时代。 开采这些大型数据集并获得经过校准的预测具有直接相关性和实用性。 我们在工作中根据深神经网络开发了基因表现推断方法。 然而,与典型的深神经网络不同, 我们的测算技术在达到精确性方面的最先进性能的同时, 也可以提供解释, 并报告不确定性估计。 我们采用量回归框架来预测一套特定家庭保存基因表达的完全有条件的量化。 条件量化框架除了有助于提供丰富的预测解释外, 也对测量噪音有用。 然而, 检查在微缩缩缩缩微缩胶图中用以驱动估算过程的微缩缩微缩缩缩缩缩缩缩缩缩缩微缩图中使用的损失。 我们建议对测微缩微缩微缩缩放数据集采用我们的方法。 我们还将方法扩大到二进式分类。 此外, 我们调查损失平稳速度趋近的结的其他后果。