We live in a digital world that, in 2010, crossed the mark of one zettabyte data. This huge amount of data processed on computers extremely fast with optimized techniques allows one to find insights in new and emerging types of data and content and to answer questions that were previously considered beyond reach. This is the idea of Big Data. Google now offers the Google Correlate analysis public tool that, from a search term or a series of temporal or regional data, provides a list of queries on Google whose frequencies follow patterns that best correlate with the data, according to the Pearson determination coefficient R2. Of course, correlation does not imply causation. We believe, however, that there is potential for these big data tools to find unexpected correlations that may serve as clues to interesting phenomena, from the pedagogical and even scientific point of view. As far as we know, this is the first proposal for the use of Big Data in Science Teaching, of constructionist character, taking as mediators the computer and the public and free tools such as Google Correlate. It also has an epistemological bias, not being merely a training in computational infrastructure or predictive analytics, but aiming at providing students a better understanding of physical concepts, such as phenomena, observation, measurement, physical laws, theory, and causality. With it, they would be able to become good Big Data specialists, the so needed 'data scientists' to solve the challenges of Big Data.
翻译:2010年,我们生活在一个数字世界中,这个数字世界跨过了一个zettbyte数据的标记。根据皮尔逊确定系数R2.,在计算机上处理的大量数据非常快速,使用优化技术,使得人们能够找到新的和正在出现的数据和内容类型的洞见,并回答以前被认为无法触及的问题。这是大数据的概念。谷歌现在提供了谷歌Correlate分析公共工具,从搜索术语或一系列时间或区域数据,提供谷歌的频率与数据最相联的频率模式的查询清单。根据皮尔逊确定系数R2.当然,相关性并不意味着因果关系。然而,我们认为,这些大数据工具有可能发现出乎意料的关联性,从教学甚至科学角度来说,可以作为有趣的现象的线索。据我们所知,这是第一个在科学教学中使用大数据、建筑学性质、将计算机和公共及免费工具(如谷歌Correlate)作为调解人的建议。它也有一种认知偏差,不仅仅是计算基础设施或预测性科学家的训练,也不是因果关系。我们认为,这些大数据工具有可能发现出出出出出出出出出出出出意外的关联的关联性,从教学、甚至学系中学生更需要的数据。