For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.
翻译:对于需要在推论时间处理大量文本的应用程序,大语言模型(LLMS)受到其有限的上下文窗口的阻碍,这些窗口通常是2048个象征物。内流学习是某些参数阈值以上大小的LLMS中一种新出现的现象,它是一个重要的例子,因为它只能利用适合上下文窗口的培训实例。目前为解决上下文窗口限制所做的努力涉及培训专门结构,这些结构往往比文本内学习因处理长文本的记忆足迹而显示的大小要小。我们展示了平行环境视窗(PCW),这是一种在未经进一步培训的情况下减轻任何现成LM系统外的上下文窗口限制的方法。这一方法的关键在于将长的上下文刻在符合上下文窗口的块(“windows'”)中,限制关注机制只适用于每个窗口,并在窗口中重新使用定位嵌入窗口。我们测试PCWCS方法的内文学习模式范围在7.50亿至780亿个参数之间,并显示在不经过进一步培训的情况下大幅改进任何现成的LMM(LMS-LS-LS-LS-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L