Making sense of large text corpora is difficult when scales reach thousands or millions of documents. With the advent of LLMs, the potential for large-scale sense-making is being realized. However, this presents a need for rigour in the data curation stage of thematic analysis: selecting the right documents to achieve appropriate information power (saturation) requires an auditable trace of researchers' thought processes. In this paper, we present methodological and design findings from a three-year design process where we worked with qualitative researchers to create an open-source platform called Teleoscope designed to rigorously curate documents at scale. By implementing the qualitative research values common to thematic analysis during the curation stage (which we call thematic curation), we found researchers could come to a shared understanding of a large corpus and feel confident in their curation decisions (which we call schema crystallization).
 翻译:暂无翻译