Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios. In this work, we experiment with zero-shot models in the legal case entailment task of the COLIEE 2022 competition. Our experiments show that scaling the number of parameters in a language model improves the F1 score of our previous zero-shot result by more than 6 points, suggesting that stronger zero-shot capability may be a characteristic of larger models, at least for this task. Our 3B-parameter zero-shot model outperforms all models, including ensembles, in the COLIEE 2021 test set and also achieves the best performance of a single model in the COLIEE 2022 competition, second only to the ensemble composed of the 3B model itself and a smaller version of the same model. Despite the challenges posed by large language models, mainly due to latency constraints in real-time applications, we provide a demonstration of our zero-shot monoT5-3b model being used in production as a search engine, including for legal documents. The code for our submission and the demo of our system are available at https://github.com/neuralmind-ai/coliee and https://neuralsearchx.neuralmind.ai, respectively.
翻译:最近的工作表明,语言模型的规模已扩大到数十亿个参数,例如GPT-3,在零发和几发情景中表现非常出色。在这项工作中,我们在COLIEE 2022 竞赛的法律案例要求任务中实验零发模型。我们的实验表明,在语言模型中,将参数数量扩大使我们先前零发结果的F1分比分提高超过6分,表明强的零发能力可能是较大模型的一个特点,至少就这项任务而言,这一点至少是更大的模型的一个特点。我们的3B参数零发模型在COLIEE 2021 测试集的所有模型中表现得非常优异,包括组合,并在COLIEE 2022 竞赛中实现单一模型的最佳性能,仅次于3B 模型本身的共性数,而同一模型的较小版本。尽管大型语言模型提出了挑战,主要由于实时应用中的含线限制,但我们的零发PIT5-3b模型在作为搜索引擎(包括法律文件)的制作中使用。我们提交和演示系统MAGUD1/COIS的代码。