AI large language models have (co-)produced amazing written works from newspaper articles to novels and poetry. These works meet the standards of the standard definition of creativity: being original and useful, and sometimes even the additional element of surprise. But can a large language model designed to predict the next text fragment provide creative, out-of-the-box, responses that still solve the problem at hand? We put Open AI's generative natural language model, GPT-3, to the test. Can it provide creative solutions to one of the most commonly used tests in creativity research? We assessed GPT-3's creativity on Guilford's Alternative Uses Test and compared its performance to previously collected human responses on expert ratings of originality, usefulness and surprise of responses, flexibility of each set of ideas as well as an automated method to measure creativity based on the semantic distance between a response and the AUT object in question. Our results show that -- on the whole -- humans currently outperform GPT-3 when it comes to creative output. But, we believe it is only a matter of time before GPT-3 catches up on this particular task. We discuss what this work reveals about human and AI creativity, creativity testing and our definition of creativity.
翻译:AI 大型语言模型( 共同) 从报纸文章到小说和诗歌, 产生了惊人的写作作品。 这些作品符合创造性标准定义的标准标准: 原始的和有用的, 有时甚至是额外的惊喜因素。 但是, 设计用来预测下一个文本碎片的大型语言模型能够提供创造性的、 外框的反应, 仍然能解决手头的问题吗? 我们把开放的AI的基因化自然语言模型GPT-3放到测试中。 它能为创造性研究中最常用的测试之一提供创造性的解决方案吗? 我们评估了GPT-3在 Gelford 替代用途测试中的创造力, 并将其表现与以前收集的人类在以下方面的反应进行比较: 原创性的专家评级、 有用性和令人惊讶的反应、 每套想法的灵活性以及基于响应和 AUT 对象之间的语义距离的自动测量创造力的方法。 我们的结果表明, 总的来说, 当创造产出时, 人类目前比 GPT-3 差的GPT-3 3 。 但是, 我们认为, 我们只是在GPT-3 抓住了这个任务之前的时间问题。 我们讨论了这项工作揭示了人类和 AI 的创造力和创造力的测试。