Native language identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is useful for a variety of purposes including marketing, security and educational applications. NLI is usually framed as a multi-label classification task, where numerous designed features are combined to achieve state-of-the-art results. Recently deep generative approach based on transformer decoders (GPT-2) outperformed its counterparts and achieved the best results on the NLI benchmark datasets. We investigate this approach to determine the practical implications compared to traditional state-of-the-art NLI systems. We introduce transformer adapters to address memory limitations and improve training/inference speed to scale NLI applications for production.
翻译:土著语言识别(NLI)的任务是,根据个人以学习语言制作的语言,自动识别其母语(L1),对于包括营销、安全和教育应用在内的各种目的都有用。NLI通常被设计成一个多标签分类任务,其中将许多设计特征结合起来,以取得最新成果。最近,基于变压器脱钩器(GPT-2)的深层基因化方法优于对口单位,并在NLI基准数据集上取得了最佳结果。我们调查了这一方法,以确定与传统的最先进的NLI系统相比,实际影响。我们引入变压器适应器,以解决记忆限制,提高培训/推断速度,以提升NLI的生产应用。