Whether it is in the form of transcribed conversations, blog posts, or tweets, qualitative data provides a reader with rich insight into both the overarching trends as well as the diversity of human ideas expressed through text. Handling and analyzing large amounts of qualitative data, however, is difficult, often requiring multiple time-intensive perusals in order to identify patterns. This difficulty is multiplied with each additional question or time point present in a data set. A primary challenge then is creating visualizations that support the interpretation of qualitative data by making it easier to identify and explore trends of interest. By combining the affordances of both text and visualizations, WordStream has previously enabled ease of information retrieval and processing of time-series text data, but the data-wrangling necessary to produce a WordStream remains a significant barrier for non-technical users. In response, this paper presents WordStream Maker: an end-to-end platform with a pipeline that utilizes natural language processing (NLP) to help non-technical users process raw text data and generate a customizable visualization without programming practice. Lessons learned from integrating NLP into visualization and scaling to large data sets are discussed, along with use cases to demonstrate the usefulness of the platform.
翻译:无论是转录式对话、博客文章还是推特,定性数据都使读者能够深入了解通过文字表达的总体趋势以及人类思想的多样性。然而,处理和分析大量的定性数据十分困难,往往需要多时密集的审视才能辨别模式。这一困难会随着数据集中每个额外的问题或时间点而倍增。然后,一个主要的挑战就是创造可视化,支持对质量数据的解释,使它更容易识别和探索感兴趣的趋势。WordStream将文本和可视化结合起来,从而使得信息检索和处理时间序列文本数据更加容易,但制作WordStream所需的断载数据对于非技术用户来说仍然是一大障碍。对此,本文介绍了WordStream Maker:一个终端到终端平台,该平台利用自然语言处理(NLP)帮助非技术用户处理原始文本数据,并产生可定制的可视化可视化数据,而无需编程实践。在将NLP平台纳入视觉化和缩放方面的经验教训,与大型数据案例一起讨论。