In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just $\frac{1}{12}$ of the embedding dimensions.
翻译:在人类层面的国家实验室方案任务中,如预测心理健康、个性或人口统计等,观测数量往往小于现代变压器语言模型中每个层的标准768+隐藏状态大小,限制了有效利用变压器的能力。在这里,我们提供系统研究,说明减少维度方法(主要部件分析、乘数技术或多层自动编码器)的作用,以及嵌入矢量和样本尺寸的维度,作为预测性能的函数。我们首先发现,对数据数量有限的大型模型进行微调,将带来很大的困难,通过预先培训的尺寸削减制度,可以克服这种困难。ROBERTA在人类层面的任务中始终取得顶级成绩,在更好地处理编写较长文本的用户方面,常设仲裁院将优于其他减少方法。最后,我们观察到,大部分任务取得的结果与嵌入维值仅$frac{1 ⁇ 12}的最佳性相比。