Content ratings can enable audiences to determine the suitability of various media products. With the recent advent of fan fiction, the critical issue of fan fiction content ratings has emerged. Whether fan fiction content ratings are done voluntarily or required by regulation, there is the need to automate the content rating classification. The problem is to take fan fiction text and determine the appropriate content rating. Methods for other domains, such as online books, have been attempted though none have been applied to fan fiction. We propose natural language processing techniques, including traditional and deep learning methods, to automatically determine the content rating. We show that these methods produce poor accuracy results for multi-classification. We then demonstrate that treating the problem as a binary classification problem produces better accuracy. Finally, we believe and provide some evidence that the current approach of self-annotating has led to incorrect labels limiting classification results.
翻译:内容评级可以使受众能够确定各种媒体产品的适合性。 随着流行小说的最近出现,粉丝小说内容评级的关键问题已经出现。 无论是粉丝小说内容评级是自愿的,还是监管要求的,都需要使内容评级自动化。 问题在于将粉丝小说文本和确定适当的内容评级。 其它领域的方法,例如在线书籍,虽然没有应用过,但已经尝试过。 我们提出了自然语言处理技术,包括传统和深层学习方法,以自动确定内容评级。 我们表明,这些方法在多分类方面产生错误的准确性结果。 我们然后表明,将问题作为二进制分类问题处理会提高准确性。 最后,我们相信并提供一些证据,目前自我批注的方法导致错误的标签限制分类结果。