了解公司的言词保存和邻接线的利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利 (To Know by the Company Words Keep and What Else Lies in the Vicinity) - 专知论文

会员服务 ·

0

统计量 · MoDELS · 词向量表示 · 语言模型化 · 知识 (knowledge) ·

2022 年 4 月 30 日

To Know by the Company Words Keep and What Else Lies in the Vicinity

翻译：了解公司的言词保存和邻接线的利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利于利

Jake Ryland Williams,Hunter Scott Heidenreich

The development of state-of-the-art (SOTA) Natural Language Processing (NLP) systems has steadily been establishing new techniques to absorb the statistics of linguistic data. These techniques often trace well-known constructs from traditional theories, and we study these connections to close gaps around key NLP methods as a means to orient future work. For this, we introduce an analytic model of the statistics learned by seminal algorithms (including GloVe and Word2Vec), and derive insights for systems that use these algorithms and the statistics of co-occurrence, in general. In this work, we derive -- to the best of our knowledge -- the first known solution to Word2Vec's softmax-optimized, skip-gram algorithm. This result presents exciting potential for future development as a direct solution to a deep learning (DL) language model's (LM's) matrix factorization. However, we use the solution to demonstrate a seemingly-universal existence of a property that word vectors exhibit and which allows for the prophylactic discernment of biases in data -- prior to their absorption by DL models. To qualify our work, we conduct an analysis of independence, i.e., on the density of statistical dependencies in co-occurrence models, which in turn renders insights on the distributional hypothesis' partial fulfillment by co-occurrence statistics.

翻译：最先进的自然语言处理系统(SOTA)的开发稳步地建立了吸收语言数据统计数据的新技术。这些技术常常追踪传统理论中众所周知的构思,我们研究这些连接以缩小主要国家语言处理方法的缺口,以此指导今后的工作。为此,我们引入了一种分析模型,分析通过原始算法(包括GloVe和Word2Vec)获得的统计数据,为使用这些算法的系统以及一般共同使用的统计数据的系统提供了深入了解。在这项工作中,我们从我们的知识中获取 -- -- 最先进的知识 -- -- 是Word2Vec软式最佳算法和跳过算法的第一个已知解决方案,我们研究这些连接以缩小关键国家语言处理方法的空白,作为未来发展的一个直接解决方案,以深入学习(DL)语言模型(包括GloVe和Word2Vec)的矩阵要素化。然而,我们使用这一解决方案来展示一种似乎普遍存在的部分语言矢量展览的属性,并使得我们能够对数据中的偏向性进行预防性的辨识 -- -- 在数据中,通过统计密度模型进行吸收之前,将数据正统化分析。

0

相关内容

统计量

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

饱和土-管桩动力相互作用理论与振动特性研究

国家自然科学基金

0+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

新的小分子化合物WJ460通过靶向Myoferlin抑制乳腺癌转移和复发的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

富盐有机废液焚烧灰组分形成过程与演变机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

考虑裂隙的膨胀土非饱和渗透特性及边坡降雨入渗数值模拟

国家自然科学基金

0+阅读 · 2013年12月31日

基于信号放大技术的表面增强拉曼成像分析法用于肿瘤细胞检测及单细胞分析

国家自然科学基金

0+阅读 · 2013年12月31日

Maspin与NF-κB家族成员在前列腺癌中分子调控与作用机制的相关性研究

国家自然科学基金

0+阅读 · 2011年12月31日

食管癌细胞中PI3K/AKT-HIF1α36890;路对糖酵解的影响

国家自然科学基金

0+阅读 · 2008年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification

Arxiv

0+阅读 · 2022年6月17日

Accelerating numerical methods by gradient-based meta-solving

Arxiv

0+阅读 · 2022年6月17日

Zero-Shot AutoML with Pretrained Models

Arxiv

0+阅读 · 2022年6月16日

Combining Covariate Adjustment with Group Sequential, Information Adaptive Designs to Improve Randomized Trial Efficiency

Arxiv

0+阅读 · 2022年6月16日

Learning to Infer Structures of Network Games

Arxiv

0+阅读 · 2022年6月16日

On Error and Compression Rates for Prototype Rules

Arxiv

0+阅读 · 2022年6月16日

"Understanding Robustness Lottery": A Comparative Visual Analysis of Neural Network Pruning Approaches

Arxiv

0+阅读 · 2022年6月16日

Roadblocks to Attracting Students to Software Testing Careers: Comparisons of Replicated Studies

Arxiv

0+阅读 · 2022年6月16日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

35+阅读 · 2022年4月25日

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

Arxiv

25+阅读 · 2019年5月21日

VIP会员

文章信息

相关主题

词向量表示

语言模型化

知识 (knowledge)

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

相关论文

Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification

Arxiv

0+阅读 · 2022年6月17日

Accelerating numerical methods by gradient-based meta-solving

Arxiv

0+阅读 · 2022年6月17日

Zero-Shot AutoML with Pretrained Models

Arxiv

0+阅读 · 2022年6月16日

Combining Covariate Adjustment with Group Sequential, Information Adaptive Designs to Improve Randomized Trial Efficiency

Arxiv

0+阅读 · 2022年6月16日

Learning to Infer Structures of Network Games

Arxiv

0+阅读 · 2022年6月16日

On Error and Compression Rates for Prototype Rules

Arxiv

0+阅读 · 2022年6月16日

"Understanding Robustness Lottery": A Comparative Visual Analysis of Neural Network Pruning Approaches

Arxiv

0+阅读 · 2022年6月16日

Roadblocks to Attracting Students to Software Testing Careers: Comparisons of Replicated Studies

Arxiv

0+阅读 · 2022年6月16日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

35+阅读 · 2022年4月25日

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

Arxiv

25+阅读 · 2019年5月21日

相关基金

饱和土-管桩动力相互作用理论与振动特性研究

国家自然科学基金

0+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

新的小分子化合物WJ460通过靶向Myoferlin抑制乳腺癌转移和复发的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

富盐有机废液焚烧灰组分形成过程与演变机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

考虑裂隙的膨胀土非饱和渗透特性及边坡降雨入渗数值模拟

国家自然科学基金

0+阅读 · 2013年12月31日

基于信号放大技术的表面增强拉曼成像分析法用于肿瘤细胞检测及单细胞分析

国家自然科学基金

0+阅读 · 2013年12月31日

Maspin与NF-κB家族成员在前列腺癌中分子调控与作用机制的相关性研究

国家自然科学基金

0+阅读 · 2011年12月31日

食管癌细胞中PI3K/AKT-HIF1α36890;路对糖酵解的影响

国家自然科学基金

0+阅读 · 2008年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员