Mining large datasets and obtaining calibrated predictions from tem is of immediate relevance and utility in reliable deep learning. In our work, we develop methods for Deep neural networks based inferences in such datasets like the Gene Expression. However, unlike typical Deep learning methods, our inferential technique, while achieving state-of-the-art performance in terms of accuracy, can also provide explanations, and report uncertainty estimates. We adopt the Quantile Regression framework to predict full conditional quantiles for a given set of housekeeping gene expressions. Conditional quantiles, in addition to being useful in providing rich interpretations of the predictions, are also robust to measurement noise. Our technique is particularly consequential in High-throughput Genomics, an area which is ushering a new era in personalized health care, and targeted drug design and delivery. However, check loss, used in quantile regression to drive the estimation process is not differentiable. We propose log-cosh as a smooth-alternative to the check loss. We apply our methods on GEO microarray dataset. We also extend the method to binary classification setting. Furthermore, we investigate other consequences of the smoothness of the loss in faster convergence. We further apply the classification framework to other healthcare inference tasks such as heart disease, breast cancer, diabetes etc. As a test of generalization ability of our framework, other non-healthcare related data sets for regression and classification tasks are also evaluated.
翻译:采集大数据集并从中获取校准预测是可靠的深度学习中的的重要问题。在本文工作中,我们开发了用于基于深度神经网络的推断方法,针对类似基因表达的数据集优化神经网络。然而,与典型的深度学习方法不同,我们的推断技术不仅在准确性方面实现了最先进的性能,而且还能提供解释和报告不确定度估计。我们采用量化回归框架来预测给定一组参考基因表达的完整条件量。条件分位数不仅有助于提供关于预测的丰富解释,而且还对测量噪声具有鲁棒性。我们的技术在高通量基因组学中特别重要,这个领域正在引领个性化医疗,以及有针对性的药物设计和递送的新时代。然而,在驱动估计过程的量化回归中使用了检查损失,它是不可微分的。我们提议log-cosh作为检查损失的平滑替代方法。我们将我们的方法应用于GEO微阵列数据集。我们还将该方法扩展到二元分类设置。此外,我们研究了损失平滑性对更快收敛的其他影响。我们进一步将分类框架应用于其他医疗推断任务,如心脏病、乳腺癌、糖尿病等。作为我们演示框架的泛化能力的测试,还对其他非医疗相关的回归和分类数据集进行了评估。