We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Our experiments show that, for both methods, channel models significantly outperform their direct counterparts, which we attribute to their stability, i.e., lower variance and higher worst-case accuracy. We also present extensive ablations that provide recommendations for when to use channel prompt tuning instead of other competitive models (e.g., direct head tuning): channel prompt tuning is preferred when the number of training examples is small, labels in the training data are imbalanced, or generalization to unseen labels is required.
翻译:我们引入了一种对语言模式的吵闹通道方法, 以微小文本分类为提示。 我们的实验显示, 对于这两种方法, 频道模型都大大优于直接对等方, 我们将其归因于其稳定性, 即差异小和最坏情况的准确性高。 我们还提供了广泛的推算, 为何时使用频道快速调整而不是其他竞争性模型提供建议( 例如, 直接头调整 ) : 当培训实例数量小时, 频道快速调整更可取, 培训数据中的标签不平衡, 或者一般化为看不见的标签 。