There is an increasing interest in ensuring machine learning (ML) frameworks behave in a socially responsible manner and are deemed trustworthy. Although considerable progress has been made in the field of Trustworthy ML (TwML) in the recent past, much of the current characterization of this progress is qualitative. Consequently, decisions about how to address issues of trustworthiness and future research goals are often left to the interested researcher. In this paper, we present the first quantitative approach to characterize the comprehension of TwML research. We build a co-occurrence network of words using a web-scraped corpus of more than 7,000 peer-reviewed recent ML papers -- consisting of papers both related and unrelated to TwML. We use community detection to obtain semantic clusters of words in this network that can infer relative positions of TwML topics. We propose an innovative fingerprinting algorithm to obtain probabilistic similarity scores for individual words, then combine them to give a paper-level relevance score. The outcomes of our analysis inform a number of interesting insights on advancing the field of TwML research.
翻译:虽然近年来在可信赖的ML(TwML)领域取得了相当大的进展,但目前对这一进展的许多特征是定性的,因此,关于如何解决可信赖性和未来研究目标问题的决定往往由感兴趣的研究人员来决定。在本文中,我们提出第一个量化方法来说明对TwML研究的理解。我们利用一个由7 000多份经同行审查的网上文件组成的网络,建立了一个共同的词汇网络 -- -- 其中包括与TwML有关和无关的文件。我们利用社区探测获得这个网络的语义组合,可以推断TwML专题的相对位置。我们建议采用创新的指纹算法,以获得单词的概率相似的分数,然后将其组合起来,以提供纸质级别的相关分数。我们的分析结果提供了一些关于推进TwML研究领域的有趣见解。