从基础数据到知识发现 -- -- 生命周期方法 -- -- 使用多层网络 (From Base Data To Knowledge Discovery -- A Life Cycle Approach -- Using Multilayer Networks)

Any large complex data analysis to infer or discover meaningful information/knowledge involves the following steps (in addition to data collection, cleaning, preparing the data for analysis such as attribute elimination): i) Modeling the data -- an approach for modeling and deriving a data representation for analysis using that approach, ii) translating analysis objectives into computations on the model generated; this can be as simple as a single computation (e.g., community detection) or may involve a sequence of operations (e.g., pair-wise community detection over multiple networks) using expressions based on the model, iii) computation of the expressions generated -- efficiency and scalability come into picture here, and iv) drill-down of results to interpret or understand them clearly. Beyond this, it is also meaningful to visualize results for easier understanding. Covid-19 visualization dashboard presented in this paper is an example of this. This paper covers all of the above steps of data analysis life cycle using a data representation that is gaining importance for multi-entity, multi-feature data sets - Multilayer Networks. We use several data sets to establish the effectiveness of modeling using MLNs and analyze them using the proposed decoupling approach. For coverage, we use different types of MLNs for modeling, and community and centrality computations for analysis. The data sets used - US commercial airlines, IMDb, DBLP, and Covid-19 data set. Our experimental analyses using the identified steps validate modeling, breadth of objectives that can be computed, and overall versatility of the life cycle approach. Correctness of results is verified, where possible, using independently available ground truth. We demonstrate drill-down that is afforded by this approach (due to structure and semantics preservation) for a better understanding and visualization of results.

翻译：任何用于推断或发现有意义的信息/知识的大型复杂数据分析,都涉及以下步骤(除了数据收集、清理、为分析诸如消除属性等分析准备数据之外):i)数据建模 -- -- 一种用于模拟和得出数据表解的方法,以便使用这种方法进行分析,ii)将分析目标转化为模型的计算;这可以像单一计算(例如社区探测)那样简单,或者可能涉及一系列操作(例如,在多个网络中进行配对-19社区探测),使用基于模型的表达式(三))计算生成的表达式 -- -- 效率和可缩放性体现在这里的图片中,iv)对结果进行下调,以清楚地解释或理解。除此之外,还有必要将结果视觉化,以便更便于理解。本文中介绍的Covid-19视觉化仪(例如社区探测),或可能包含数据分析周期周期周期周期周期周期的所有步骤,使用数据模型对多功能、多功能数据集非常重要。我们使用数组网络来建立模型,在使用ML系统覆盖和计算模型时,使用不同周期的模型进行计算,并用模型来分析。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/