深神经网络能否从列名中预测出数据的校对? (Can Deep Neural Networks Predict Data Correlations from Column Names?) - 专知论文

会员服务 ·

0

列 · 相关系数 · Neural Networks · 模型评估 · Networking ·

2021 年 7 月 9 日

Can Deep Neural Networks Predict Data Correlations from Column Names?

翻译：深神经网络能否从列名中预测出数据的校对?

Immanuel Trummer

For humans, it is often possible to predict data correlations from column names. We conduct experiments to find out whether deep neural networks can learn to do the same. If so, e.g., it would open up the possibility of tuning tools that use NLP analysis on schema elements to prioritize their efforts for correlation detection. We analyze correlations for around 120,000 column pairs, taken from around 4,000 data sets. We try to predict correlations, based on column names alone. For predictions, we exploit pre-trained language models, based on the recently proposed Transformer architecture. We consider different types of correlations, multiple prediction methods, and various prediction scenarios. We study the impact of factors such as column name length or the amount of training data on prediction accuracy. Altogether, we find that deep neural networks can predict correlations with a relatively high accuracy in many scenarios (e.g., with an accuracy of 95% for long column names).

翻译：对于人类来说,通常可以预测列名中的数据相关性。我们进行实验, 以确定深神经网络能否学习同样的方法。如果可以, 比如说, 它将打开调整工具的可能性, 以便使用NLP对化学元素的分析, 来优先进行相关检测。我们分析大约 4 000 个数据集中大约 120 000 个列配的关联。我们试图预测相关关系, 仅以列名为基础。对于预测, 我们利用根据最近提议的变换器结构开发的预先训练的语言模型。我们考虑不同类型的关联、多重预测方法和各种预测假设。我们研究列名长度或预测准确性培训数据数量等因素的影响。总的来说, 我们发现深神经网络可以预测许多情景中的相对高精度相关关系( 例如, 长列名的精确度为95% ) 。

0

相关内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

专知会员服务

317+阅读 · 2020年7月9日

因果关联学习，Causal Relational Learning

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

专知会员服务

37+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

已删除

将门创投

5+阅读 · 2019年10月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Relating Graph Neural Networks to Structural Causal Models

Arxiv

44+阅读 · 2021年9月9日

Fea2Fea: Exploring Structural Feature Correlations via Graph Neural Networks

Arxiv

0+阅读 · 2021年9月9日

Predicting Process Name from Network Data

Arxiv

0+阅读 · 2021年9月3日

Link Prediction on N-ary Relational Facts: A Graph-based Approach

Arxiv

13+阅读 · 2021年5月18日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Arxiv

4+阅读 · 2020年9月18日

Type-augmented Relation Prediction in Knowledge Graphs

Type-augmented Relation Prediction in Knowledge Graphs

Arxiv

6+阅读 · 2020年9月16日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Improving Deep Binary Embedding Networks by Order-aware Reweighting of Triplets

Arxiv

7+阅读 · 2018年4月17日

Investigations on Knowledge Base Embedding for Relation Prediction and Extraction

Arxiv

8+阅读 · 2018年2月6日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

专知会员服务

317+阅读 · 2020年7月9日

因果关联学习，Causal Relational Learning

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

专知会员服务

37+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

已删除

将门创投

5+阅读 · 2019年10月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Relating Graph Neural Networks to Structural Causal Models

Arxiv

44+阅读 · 2021年9月9日

Fea2Fea: Exploring Structural Feature Correlations via Graph Neural Networks

Arxiv

0+阅读 · 2021年9月9日

Predicting Process Name from Network Data

Arxiv

0+阅读 · 2021年9月3日

Link Prediction on N-ary Relational Facts: A Graph-based Approach

Arxiv

13+阅读 · 2021年5月18日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Arxiv

4+阅读 · 2020年9月18日

Type-augmented Relation Prediction in Knowledge Graphs

Type-augmented Relation Prediction in Knowledge Graphs

Arxiv

6+阅读 · 2020年9月16日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Improving Deep Binary Embedding Networks by Order-aware Reweighting of Triplets

Arxiv

7+阅读 · 2018年4月17日

Investigations on Knowledge Base Embedding for Relation Prediction and Extraction

Arxiv

8+阅读 · 2018年2月6日

微信扫码咨询专知VIP会员