We present a set of novel neural supervised and unsupervised approaches for determining the readability of documents. In the unsupervised setting, we leverage neural language models, whereas in the supervised setting, three different neural classification architectures are tested. We show that the proposed neural unsupervised approach is robust, transferable across languages and allows adaptation to a specific readability task and data set. By systematic comparison of several neural architectures on a number of benchmark and new labelled readability datasets in two languages, this study also offers a comprehensive analysis of different neural approaches to readability classification. We expose their strengths and weaknesses, compare their performance to current state-of-the-art classification approaches to readability, which in most cases still rely on extensive feature engineering, and propose possibilities for improvements.
翻译:我们提出了一套确定文件可读性的新颖神经监督和不受监督的新方法。在不受监督的环境中,我们利用神经语言模型,而在受监督的环境中,则测试了三种不同的神经分类结构。我们表明,拟议的神经不受监督的方法是稳健的,可以跨语言转移,并允许适应特定的可读性任务和数据集。通过系统比较两种语言的若干基准和新标签可读性数据集的若干神经结构,本研究报告还全面分析了不同的可读性分类神经方法。我们暴露了这些模型的长处和短处,将其与当前最先进的可读性分类方法进行比较,在大多数情况下,后者仍然依赖广泛的特征工程,并提出了改进的可能性。