The subtleties of human perception, as measured by vision scientists through the use of psychophysics, are important clues to the internal workings of visual recognition. For instance, measured reaction time can indicate whether a visual stimulus is easy for a subject to recognize, or whether it is hard. In this paper, we consider how to incorporate psychophysical measurements of visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can enforce consistency with human behavior. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we describe a general enhancement strategy, underpinned by the new loss formulation, which can be applied to the training regime of any deep learning-based document transcription system. Through experimentation, reliable performance improvement is demonstrated for the standard IAM and RIMES datasets for three different network architectures. Further, we go on to show feasibility for our approach on a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall in the the 9th century.
翻译:由视觉科学家通过心理物理学测量的人类认知的微妙之处,是视觉识别内部工作的重要线索。例如,有节制的反应时间可以表明视觉刺激对于一个受体来说是否容易识别,或者它是否困难。在本文件中,我们考虑如何将视觉视力的心理物理测量纳入深层神经网络的损失功能中,并假设这类信息能够确保与人类行为的一致性。作为评估这种方法可行性的案例研究,我们研究了手写文件抄录的问题。虽然在自动改写现代笔迹方面已经取得了良好进展,但在抄写历史文件方面仍然存在重大挑战。我们在这里描述了一种普遍增强战略,其基础是新的损失配方,可以适用于任何深层学习的文档抄录系统的培训制度。通过实验,对标准的IAM和三种不同网络结构的RIMES数据集进行了可靠的性能改进。此外,我们继续展示我们在新数据化拉丁手写手稿集上的方法的可行性,最初由在9世纪的Gallister制作。