Social network analysis allows researchers to discover insights from connections between people. While the process of building a social network is relatively straightforward for contemporary social media, deriving connections from historical archives remains a challenging task, with every data collection presenting its unique challenges. Our contribution focuses on building and analysing a social network from the correspondence archive of Sybren Valkema (1916-1996), a Dutch glass artist and educator. The archive contains both typewritten and handwritten documents in multiple languages, and includes letters from glass artists, art students, art collectors and other agents. We develop an automatic pipeline approach which includes separating handwritten and typed documents, performing text recognition specific to the document modality, extracting names of people from text using named entity recognition, de-duplicating the resulting names to create actor nodes, classifying the actors using entity linking, and, finally, connecting them together and analysing the resulting network. Every part of the pipeline is evaluated against a manual analysis performed by an art historian on a subset of the data collection in order to find out which pitfalls of the automatic approach need to be resolved in future work and, on the contrary, whether using the automatic approach allows to discover any additional insights. The results show strong performance in discovering sender-receiver connections as well as additional meaningful connections in text, with the main challenge being text recognition on scanned pages.
翻译:暂无翻译