Increasing computational power and improving deep learning methods have made computer vision technologies pervasively common in urban environments. Their applications in policing, traffic management, and documenting public spaces are increasingly common. Despite the often-discussed biases in the algorithms' training and unequally borne benefits, almost all applications similarly reduce urban experiences to simplistic, reductive, and mechanistic measures. There is a lack of context, depth, and specificity in these practices that enables semantic knowledge or analysis within urban contexts, especially within the context of using and occupying urban space. This paper will critique existing uses of artificial intelligence and computer vision in urban practices to propose a new framework for understanding people, action, and public space. This paper revisits Geertz's use of thick descriptions in generating interpretive theories of culture and activity and uses this lens to establish a framework to evaluate the varied uses of computer vision technologies that weigh meaning. We discuss how the framework's positioning may differ (and conflict) between different users of the technology. This paper also discusses the current use and training of deep learning algorithms and how this process limits semantic learning and proposes three potential methodologies for gaining a more contextually specific, urban-semantic, description of urban space relevant to urbanists. This paper contributes to the critical conversations regarding the proliferation of artificial intelligence by challenging the current applications of these technologies in the urban environment by highlighting their failures within this context while also proposing an evolution of these algorithms that may ultimately make them sensitive and useful within this spatial and cultural milieu.
翻译:计算能力和深层学习方法的不断增强使计算机视觉技术在城市环境中普遍普及,这些技术在治安、交通管理和记录公共空间方面的应用越来越普遍。尽管在算法培训方面经常讨论偏差,而且收益分配不均,但几乎所有应用都同样将城市经验简化、消化和机械化措施。这些实践缺乏背景、深度和具体性,使得在城市环境中,特别是在使用和占有城市空间的背景下,能够进行语义学知识或分析。本文件将批评目前在城市实践中使用人工智能和计算机视觉以提出理解人、行动和公共空间的新框架的情况。本文回顾了Geertz在生成文化和活动解释理论时使用的厚厚描述,并用这一透镜来评估计算机视觉技术的不同用途,这些技术的意义是分量的。我们讨论了框架的定位如何在城市环境中,特别是在使用和占用城市空间空间空间空间空间的用户之间有所不同(和冲突)。本文还将讨论目前对深层次学习算法的使用和培训,以及这一过程如何限制语义学习,并提出了三种潜在方法,以获得关于文化和活动的解释性理论理论的精准性理论,从而最终地评估城市空间环境的演变为城市分析。