TabText: 表格数据结构中综合知识的系统化方法 (TabText: a Systematic Approach to Aggregate Knowledge Across Tabular Data Structures) - 专知论文

会员服务 ·

0

知识 (knowledge) · INFORMS · Unstructured · Processing（编程语言） · Learning ·

2022 年 6 月 21 日

TabText: a Systematic Approach to Aggregate Knowledge Across Tabular Data Structures

翻译：TabText: 表格数据结构中综合知识的系统化方法

Dimitris Bertsimas,Kimberly Villalobos Carballo,Yu Ma,Liangyuan Na,Léonard Boussioux,Cynthia Zeng,Luis R. Soenksen,Ignacio Fuentes

Processing and analyzing tabular data in a productive and efficient way is essential for building successful applications of machine learning in fields such as healthcare. However, the lack of a unified framework for representing and standardizing tabular information poses a significant challenge to researchers and professionals alike. In this work, we present TabText, a methodology that leverages the unstructured data format of language to encode tabular data from different table structures and time periods efficiently and accurately. We show using two healthcare datasets and four prediction tasks that features extracted via TabText outperform those extracted with traditional processing methods by 2-5%. Furthermore, we analyze the sensitivity of our framework against different choices for sentence representations of missing values, meta information and language descriptiveness, and provide insights into winning strategies that improve performance.

翻译：以有效和高效的方式处理和分析表格数据,对于在保健等领域成功应用机器学习至关重要。然而,缺乏代表和标准化表格信息的统一框架,对研究人员和专业人员都构成重大挑战。在这项工作中,我们提供了TabText。TabText是利用语言的无结构数据格式将不同表格结构和时间段的表格数据编码的方法。我们用TabText提取的两套保健数据集和四项预测任务,将传统处理方法提取的数据比传统处理方法高出2-5%。此外,我们分析了我们框架对缺少的数值、元信息和语言描述的不同判决表达选择的敏感性,并为改进业绩的获胜战略提供了见解。

0

相关内容

知识 (knowledge)

知识 (knowledge)

通过学习、实践或探索所获得的认识、判断或技能。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

层状三角晶格RExRhO2（RE=K, Na）晶体生长、RE精确调控及其与晶体拓扑量子特性关系的研究

国家自然科学基金

0+阅读 · 2016年12月31日

B与H离子共注入剥离SiC晶体波导特性的研究

国家自然科学基金

0+阅读 · 2015年12月31日

环金属铂(II)配合物的超分子动态自组装研究

国家自然科学基金

0+阅读 · 2015年12月31日

组蛋白去乙酰化酶Sirt1/Sirt6在动脉粥样硬化内皮炎症反应的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

prohibitin与PIG3基因启动子区（TGYCC）n序列结合并调控其转录的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

proBDNF通过P75NTR/sortilin受体促进心肌缺血再灌注损伤的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

现代调和分析及其在PDE和信息科学中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

大承气汤调控AR42J细胞凋亡-坏死转换分子开关的转化研究

国家自然科学基金

0+阅读 · 2009年12月31日

二苯基氧化磷芳基脒超分子聚合结构化合物的合成及结构功能

国家自然科学基金

0+阅读 · 2008年12月31日

Hierarchical Interpretation of Neural Text Classification

Arxiv

0+阅读 · 2022年8月9日

Exploring the Effects of Data Augmentation for Drivable Area Segmentation

Arxiv

0+阅读 · 2022年8月6日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Deep Graph Structure Learning for Robust Representations: A Survey

Arxiv

21+阅读 · 2021年3月4日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

Deep Learning on Graphs: A Survey

Arxiv

53+阅读 · 2018年12月11日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

VIP会员

文章信息

相关主题

知识 (knowledge)

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《具备集体态势感知能力的深度强化学习智能体在超视距空战中的应用研究》最新文献

《美军条令文件：频谱管理操作技术》2025最新100页

反制小型无人机：一项重大挑战

《AI作战：将人机协作集成至实时、虚拟与建构环境（LVC）的建模与仿真》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Hierarchical Interpretation of Neural Text Classification

Arxiv

0+阅读 · 2022年8月9日

Exploring the Effects of Data Augmentation for Drivable Area Segmentation

Arxiv

0+阅读 · 2022年8月6日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Deep Graph Structure Learning for Robust Representations: A Survey

Arxiv

21+阅读 · 2021年3月4日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

Deep Learning on Graphs: A Survey

Arxiv

53+阅读 · 2018年12月11日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

相关基金

层状三角晶格RExRhO2（RE=K, Na）晶体生长、RE精确调控及其与晶体拓扑量子特性关系的研究

国家自然科学基金

0+阅读 · 2016年12月31日

B与H离子共注入剥离SiC晶体波导特性的研究

国家自然科学基金

0+阅读 · 2015年12月31日

环金属铂(II)配合物的超分子动态自组装研究

国家自然科学基金

0+阅读 · 2015年12月31日

组蛋白去乙酰化酶Sirt1/Sirt6在动脉粥样硬化内皮炎症反应的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

prohibitin与PIG3基因启动子区（TGYCC）n序列结合并调控其转录的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

proBDNF通过P75NTR/sortilin受体促进心肌缺血再灌注损伤的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

现代调和分析及其在PDE和信息科学中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

大承气汤调控AR42J细胞凋亡-坏死转换分子开关的转化研究

国家自然科学基金

0+阅读 · 2009年12月31日

二苯基氧化磷芳基脒超分子聚合结构化合物的合成及结构功能

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员