标题：主观和客观总是一致吗？以Stack Overflow问题为例的案例研究摘要：在Stack Overflow (SO) 中，用户通过投票机制对帖子 (即问题和回答) 进行主观评估。帖子获得的净票数 (赞成票 - 反对票) 通常被视为其质量的近似值。然而，约有一半获得工作解决方案的问题的获得的反对票多于赞成票。此外，约 18% 的被接受的答案 (即验证的解决方案) 也没有获得最大票数。所有这些违反常规的发现对 SO 采用的评估机制的可靠性产生了质疑。此外，许多用户对该评估表达了关注，特别是对其帖子的反对票。因此，非常有必要进行严格的验证以确保一个不带偏见和可靠的质量评估机制。在本文中，我们使用250万个问题和十个文本分析指标将主观评估与客观评估进行了比较。根据我们的调查，四个客观指标与主观评估一致，两个不一致，一个既有可能同意也有可能不同意，其余三个则既不同意也不否定主观评估。然后，我们开发了机器学习模型来分类推广和被打压的问题。我们的模型以最高约76% - 87%的准确度优于最先进的模型。 (Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions)

翻译：标题：主观和客观总是一致吗？以Stack Overflow问题为例的案例研究摘要：在Stack Overflow (SO) 中，用户通过投票机制对帖子 (即问题和回答) 进行主观评估。帖子获得的净票数 (赞成票 - 反对票) 通常被视为其质量的近似值。然而，约有一半获得工作解决方案的问题的获得的反对票多于赞成票。此外，约 18% 的被接受的答案 (即验证的解决方案) 也没有获得最大票数。所有这些违反常规的发现对 SO 采用的评估机制的可靠性产生了质疑。此外，许多用户对该评估表达了关注，特别是对其帖子的反对票。因此，非常有必要进行严格的验证以确保一个不带偏见和可靠的质量评估机制。在本文中，我们使用250万个问题和十个文本分析指标将主观评估与客观评估进行了比较。根据我们的调查，四个客观指标与主观评估一致，两个不一致，一个既有可能同意也有可能不同意，其余三个则既不同意也不否定主观评估。然后，我们开发了机器学习模型来分类推广和被打压的问题。我们的模型以最高约76% - 87%的准确度优于最先进的模型。

Saikat Mondal,Mohammad Masudur Rahman,Chanchal K. Roy

from arxiv, Accepted in the International Conference on Mining Software Repositories (MSR 2023)

In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e., verified solutions) also do not score the maximum votes. All these counter-intuitive findings cast doubts on the reliability of the evaluation mechanism employed at SO. Moreover, many users raise concerns against the evaluation, especially downvotes to their posts. Therefore, rigorous verification of the subjective evaluation is highly warranted to ensure a non-biased and reliable quality assessment mechanism. In this paper, we compare the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. According to our investigation, four objective metrics agree with the subjective evaluation, two do not agree, one either agrees or disagrees, and the remaining three neither agree nor disagree with the subjective evaluation. We then develop machine learning models to classify the promoted and discouraged questions. Our models outperform the state-of-the-art models with a maximum of about 76% - 87% accuracy.

翻译：