Linguistic information is encoded at varying timescales (subwords, phrases, etc.) and communicative levels, such as syntax and semantics. Contextualized embeddings have analogously been found to capture these phenomena at distinctive layers and frequencies. Leveraging these findings, we develop a fully learnable frequency filter to identify spectral profiles for any given task. It enables vastly more granular analyses than prior handcrafted filters, and improves on efficiency. After demonstrating the informativeness of spectral probing over manual filters in a monolingual setting, we investigate its multilingual characteristics across seven diverse NLP tasks in six languages. Our analyses identify distinctive spectral profiles which quantify cross-task similarity in a linguistically intuitive manner, while remaining consistent across languages-highlighting their potential as robust, lightweight task descriptors.
翻译:语言信息在不同的时间尺度(子词、词组等)和交流级别(如语法和语义等)进行编码。背景化嵌入类似地在不同的层次和频率上捕捉这些现象。利用这些发现,我们开发了一个完全可学习的频率过滤器,以确定任何特定任务的光谱剖面。它比以前手工制作的过滤器能够进行更多的颗粒分析,并提高效率。在展示单一语言环境中的光谱在手动过滤器上显示信息性之后,我们用六种语言调查其七种不同的非语言任务的多语种特征。我们的分析找出了以语言直观方式量化交叉任务相似性的独特频谱剖面,同时保持了不同语言的一致性,使其作为强健、轻量的任务描述符的潜力。