Current visual text analysis approaches rely on sophisticated processing pipelines. Each step of such a pipeline potentially amplifies any uncertainties from the previous step. To ensure the comprehensibility and interoperability of the results, it is of paramount importance to clearly communicate the uncertainty not only of the output but also within the pipeline. In this paper, we characterize the sources of uncertainty along the visual text analysis pipeline. Within its three phases of labeling, modeling, and analysis, we identify six sources, discuss the type of uncertainty they create, and how they propagate.
翻译:目前的直观文本分析方法依赖于复杂的处理管道。 这种管道的每一个步骤都可能放大前一步的任何不确定因素。 为了确保结果的可理解性和互操作性,最重要的是不仅要明确传达产出的不确定性,而且要在管道内进行。在本文中,我们描述视觉文本分析管道的不确定性来源。在标签、建模和分析的三个阶段,我们找出六个来源,讨论它们造成的不确定性的类型及其传播方式。