The success of deep learning (DL) fostered the creation of unifying frameworks such as tensorflow or pytorch as much as it was driven by their creation in return. Having common building blocks facilitates the exchange of, e.g., models or concepts and makes developments easier replicable. Nonetheless, robust and reliable evaluation and assessment of DL models has often proven challenging. This is at odds with their increasing safety relevance, which recently culminated in the field of "trustworthy ML". We believe that, among others, further unification of evaluation and safeguarding methodologies in terms of toolkits, i.e., small and specialized framework derivatives, might positively impact problems of trustworthiness as well as reproducibility. To this end, we present the first survey on toolkits for uncertainty estimation (UE) in DL, as UE forms a cornerstone in assessing model reliability. We investigate 11 toolkits with respect to modeling and evaluation capabilities, providing an in-depth comparison for the three most promising ones, namely Pyro, Tensorflow Probability, and Uncertainty Quantification 360. While the first two provide a large degree of flexibility and seamless integration into their respective framework, the last one has the larger methodological scope.
翻译:深层次学习的成功(DL)促进了统一框架的建立,如龙卷风或热火炉等,而这种框架的创建正是由它们所驱动的。共同的构件有助于交换模型或概念,使发展更容易复制。然而,对DL模型的有力和可靠的评价和评估往往证明具有挑战性。这与这些模型日益增加的安全相关性不相符合,最近在“可信赖的ML”领域达到了高潮。我们认为,除其他外,进一步统一评估和维护工具包方面的方法,即小型和专门的框架衍生物,可能会对可信任性和可复制性产生积极的影响。为此目的,我们介绍DL的不确定性估算工具包(UE)的第一次调查,作为评估模型可靠性的基石。我们调查了11个关于建模和评价能力的工具包,为三种最有前途的工具包,即Pyro、Tensorpro Probility和不确定性量化360提供了深入的比较。前两个工具包提供了较大程度的灵活性和无缝合的各自框架。最后一种是更大的方法。