The great majority of languages in the world are considered under-resourced for the successful application of deep learning methods. In this work, we propose a meta-learning approach to document classification in limited-resource setting and demonstrate its effectiveness in two different settings: few-shot, cross-lingual adaptation to previously unseen languages; and multilingual joint training when limited target-language data is available during training. We conduct a systematic comparison of several meta-learning methods, investigate multiple settings in terms of data availability and show that meta-learning thrives in settings with a heterogeneous task distribution. We propose a simple, yet effective adjustment to existing meta-learning methods which allows for better and more stable learning, and set a new state of the art on several languages while performing on-par on others, using only a small amount of labeled data.
翻译:在这项工作中,我们提出了一种元学习方法,用于记录有限资源环境中的分类,并在两种不同环境中展示其有效性:在培训期间,对有限的目标语言数据进行少量的跨语言适应;在培训期间对有限的目标语言数据进行多语种联合培训。我们系统地比较了几种元学习方法,调查了数据可用性方面的多种环境,并表明元学习在任务分布不一的环境中蓬勃发展。我们建议对现有的元学习方法进行简单而有效的调整,以便更好和更稳定的学习,并在几种语言上设置新的艺术水平,同时只使用少量的标签数据,同时对其它语言进行平行演练。