Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data. iPrompt iteratively alternates between generating explanations with an LLM and reranking them based on their performance when used as a prompt. Experiments on a wide range of datasets, from synthetic mathematics to natural-language understanding, show that iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions. Moreover, the prompts produced by iPrompt are simultaneously human-interpretable and highly effective for generalization: on real-world sentiment classification datasets, iPrompt produces prompts that match or even improve upon human-written prompts for GPT-3. Finally, experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery. All code for using the methods and data here is made available on Github.
翻译:大型语言模型( LLMS) 展示了令人印象深刻的利用自然语言执行复杂任务的能力。 在这项工作中,我们探索了我们是否能够利用这一学习能力来查找和解释数据模式。 具体地说,根据预先培训的LLMM 和数据实例,我们引入了可解释的自动促进( iPrompt) 算法( iPrompt), 这个算法可以产生一种自然语言字符串来解释数据。 iPrompt 在与LLMM 生成解释并根据其在快速使用时的性能进行重新排序之间, 进行迭代迭代交替。 在从合成数学到自然语言理解的广泛数据集的实验中, 显示iPrompt可以通过准确查找地真真真真假数据集描述来产生有意义的洞察力。 此外, iPrompt 所生成的提示是同时的, 并且对于一般化非常有效: 在真实世界情绪分类数据集中, iPrompt 生成的提示能匹配甚至改进 GPT-3 。 最后, 与 FMRI 数据集的实验显示了iPromp 帮助科学发现的潜力。