Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task-based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as "clickbait" or "news". Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human-perception level and the computer-extraction level and conceptualizing connections between them.
翻译:文本提供了可用于激励和探索分类问题的非结构化数据的一个令人信服的实例。文本在文字表达方式作为字符字符字符串和识别与基本现象相联系的特征之间的文字特征和学生关联方面出现了挑战。为了观察学生如何在旨在引出域某些方面的情景中对文本数据进行合理解释,我们采用了基于任务的访谈方法,与六对本科生使用结构化协议。我们的目标是通过将标题分类为“点击”或“新闻”的激励性任务,阐明学生对文本作为数据的理解。三种特征(功能、内容和形式)出现在第一种情景中,大多数出现在第一种情景中。我们对访谈的分析表明,这一系列活动让参与者在人类认知层面和计算机扩展层面进行思考,并将他们之间的联系概念化。