利用实体决议和背景嵌入实现自动元元数据统一 (Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding)

from arxiv, Paper Accepted at Computing Conference, 2021 (Research Conference formerly called Science and Information (SAI) Conference). This is a replacement with change edit on conference status updated to "Accepted"

ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.

翻译：ML 数据归结过程通常由多种和联结源系统组成,有多种模式结构; 需要整理过程,将不同体系的元数据标准化到一个互操作的体系。这个元数据统一和编目手动过程减缓了 ML-Ops 生命周期的效率。我们在实体解析方法的帮助下展示了这一步骤的自动化, 并且还使用了代码数据库的 Db2Vec 嵌入方法, 以捕捉隐藏的校内和校内关系, 发现元数据的相似性, 然后将元数据从源体系的元数据列预测到任何标准化的体系。除了匹配 schemas 外, 我们证明它还可以推断目标数据模型的正确本体结构。

相关内容

实体解析

关注 5

不同的数据提供方对同一个事物即实体 (Entity)可能会有不同的描述 (这里的描述包括数据格式、表示方法等) ，每一个对实体的描述称为该实体的一个引用。实体解析，是指从一个“ 引用集合”中解析并映射到现实世界中的“ 实体”过程。实体解析(Entity Resolution)又被称为记录链接(Record Linkage) 、对象识别(object Identification ) 、个体识别(Individual Identification) 、重复检测(Duplicate Detection)

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日