In this paper, we explore issues that we have encountered in developing a pipeline that combines natural language processing with data analysis and visualization techniques. The characteristics of the corpus - being comprised of diaries of a single person spanning several decades - present both conceptual challenges in terms of issues of representation, and affordances as a source for historical research. We consider these issues in a team context with a particular focus on the generation and interpretation of visualizations.
翻译:在本文中,我们探讨了我们在开发一种将自然语言处理与数据分析和可视化技术相结合的管道过程中遇到的问题,该保护伞的特征(由一个人几十年的日记组成)既在代表性问题上提出了概念挑战,又作为历史研究的来源。我们从团队的角度审议这些问题,特别侧重于可视化的生成和解释。