High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of key scientific knowledge on the dynamics of the underlying biological system. However, extracting information and insights from these complex datasets remains a significant challenge owing to their multidimensionality and lack of prior knowledge about their complex structure. In this paper, we present Pheno-Mapper, an interactive toolbox for the exploratory analysis and visualization of large-scale phenomics data. Our approach uses the mapper framework to perform a topological analysis of the data, and subsequently render visual representations with built-in data analysis and machine learning capabilities. We demonstrate the utility of this new tool on real-world plant (e.g., maize) phenomics datasets. In comparison to existing approaches, the main advantage of Pheno-Mapper is that it provides rich, interactive capabilities in the exploratory analysis of phenomics data, and it integrates visual analytics with data analysis and machine learning in an easily extensible way. In particular, Pheno-Mapper allows the interactive selection of subpopulations guided by a topological summary of the data and applies data mining and machine learning to these selected subpopulations for in-depth exploration.
翻译:收集实地数据的高通量技术使得在生命科学的若干分支中能够大规模地对实地数据进行观测,所收集的数据从分子层次(基因类型)到生理(胎儿特征)和环境观测(例如天气、土壤条件),从分子层次(基因类型)到生理(胎儿特征)和环境观测(例如天气、土壤条件),可以广泛收集大量的数据,统称为人文学数据,它们代表着关于基本生物系统动态的关键科学知识的宝库。然而,从这些复杂的数据集中提取信息和洞察力仍然是一个重大挑战,因为它们具有多面性,而且缺乏对其复杂结构的先前了解。在本文件中,我们介绍了Pheno-Mapper,这是一个用于对大规模人文学数据进行探索性分析和可视化的交互式工具箱。我们的方法是利用地图框架对数据进行表层分析,然后用内在数据分析和机器学习能力进行视觉展示。我们展示了这一新工具在现实世界工厂(例如玉米)下对人文组数据集的实用性。与现有方法相比,Pheno-Mapper,Pe-Mapper,这是对大规模基因学数据进行可分析的主要优势和可视性数据分析,通过这种分析,使这些数据在深度数据中进行富有的深度数据学的深度数据学的深度数据学的深度和深度数据分析,它能和深度数据进行一种可分析,使这些可分析能的深度的深度数据学的深度数据能的深度数据能和深度的深度的深度的深度数据学的深度数据能能提供一种可提供一种可提供一种可分析。