Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wide-spread real world use. Yet, these LLMs are closed source, and little is known about their performance in real-world use cases. In academia, LLM performance is often measured on benchmarks which may have leaked into the LLM's training data. We apply and evaluate ChatGPT and GPT-4 for the real-world task of cost-efficiently extracting insights from a text corpus published after the LLMs were trained. We extract 4,392 research challenges in over 90 topics from the 2023 CHI conference proceedings and visualize the research challenges for interactive exploration. We critically evaluate the LLMs on this practical task and conclude that the combination of ChatGPT and GPT-4 makes an excellent cost-efficient means for analyzing a corpus at scale. Cost-efficiency is key for prototyping research ideas and analyzing text corpora from different perspectives, with implications for applying LLMs in academia and practice.
翻译:暂无翻译