Tagging facilitates information retrieval in social media and other online communities by allowing users to organize and describe online content. Researchers found that the efficiency of tagging systems steadily decreases over time, because tags become less precise in identifying specific documents, i.e., they lose their descriptiveness. However, previous works did not answer how or even whether community managers can improve the efficiency of tags. In this work, we use information-theoretic measures to track the descriptive and retrieval efficiency of tags on Stack Overflow, a question-answering system that strictly limits the number of tags users can specify per question. We observe that tagging efficiency stabilizes over time, while tag content and descriptiveness both increase. To explain this observation, we hypothesize that limiting the number of tags fosters novelty and diversity in tag usage, two properties which are both beneficial for tagging efficiency. To provide qualitative evidence supporting our hypothesis, we present a statistical model of tagging that demonstrates how novelty and diversity lead to greater tag efficiency in the long run. Our work offers insights into policies to improve information organization and retrieval in online communities.
翻译:研究人员发现,标记系统的效率随着时间推移而稳步下降,因为标签在识别具体文件方面变得不太精确,即失去了描述性。然而,以前的著作没有回答社区管理人员如何或甚至能否提高标签效率的问题。在这项工作中,我们使用信息理论措施跟踪Stack overflow标签的描述和检索效率,这是一个问题解答系统,严格限制标签用户人数,每个问题可以说明。我们观察到,标记效率稳定在时间上,同时标记内容和描述性都有所增加。为了解释这一观察,我们假设限制标签数量会促进标签使用的新颖性和多样性,这两种属性都有利于标签使用效率的提高。为了提供质量证据支持我们的假设,我们提出了一个标记统计模型,表明新颖性和多样性如何长期提高标签效率。我们的工作为改进在线社区的信息组织和检索政策提供了深入的见解。