研究报告：Python下的一项探索性研究：临时解析器 (An Exploratory Study of Ad Hoc Parsers in Python) - 专知论文

会员服务 ·

0

Python · 语义特征 · 程序切片 · 语义度量 · 分析 ·

2023 年 4 月 19 日

An Exploratory Study of Ad Hoc Parsers in Python

翻译：研究报告：Python下的一项探索性研究：临时解析器

Michael Schröder,Marc Goritschnig,Jürgen Cito

from arxiv, 5 pages, accepted as a registered report for MSR 2023 with Continuity Acceptance (CA)

Background: Ad hoc parsers are pieces of code that use common string functions like split, trim, or slice to effectively perform parsing. Whether it is handling command-line arguments, reading configuration files, parsing custom file formats, or any number of other minor string processing tasks, ad hoc parsing is ubiquitous -- yet poorly understood. Objective: This study aims to reveal the common syntactic and semantic characteristics of ad hoc parsing code in real world Python projects. Our goal is to understand the nature of ad hoc parsers in order to inform future program analysis efforts in this area. Method: We plan to conduct an exploratory study based on large-scale mining of open-source Python repositories from GitHub. We will use program slicing to identify program fragments related to ad hoc parsing and analyze these parsers and their surrounding contexts across 9 research questions using 25 initial syntactic and semantic metrics. Beyond descriptive statistics, we will attempt to identify common parsing patterns by cluster analysis.

翻译：背景：临时解析器是使用常见的字符串函数（例如split，trim或slice）有效执行解析的代码。无论是处理命令行参数，读取配置文件，解析自定义文件格式还是处理其他大量的字符串处理任务，临时解析是普遍存在但却很难理解的。目的：本研究旨在揭示现实世界Python项目中临时解析代码的常见语法和语义特征。我们的目标是理解临时解析器的性质，以便为今后在该领域的程序分析工作提供参考。方法：我们计划进行一项探索性研究，基于大规模挖掘GitHub上的开源Python仓库。我们将使用程序切片来识别与临时解析相关的程序片段，并使用25个初始的语法和语义度量标准，通过9个研究问题分析这些解析器及其周围的上下文。除了描述统计信息外，我们还将尝试通过聚类分析来确定常见的解析模式。

0

相关内容

Python

Python是一种面向对象的解释型计算机程序设计语言，在设计中注重代码的可读性，同时也是一种功能强大的通用型语言。

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

129+阅读 · 2023年1月29日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

251+阅读 · 2022年8月31日

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

专知会员服务

120+阅读 · 2022年3月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

【Manning2022新书】Python与PySpark的数据分析，458页pdf

【Manning2022新书】Python与PySpark的数据分析，458页pdf

专知

17+阅读 · 2022年3月21日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

干货 | Python 爬虫的工具列表大全

干货 | Python 爬虫的工具列表大全

机器学习算法与Python学习

10+阅读 · 2018年4月13日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

大豆miRNA重复基因与靶基因的共进化分析

国家自然科学基金

0+阅读 · 2013年12月31日

禾谷镰孢菌Fusarium graminearum CYP51与DMIs类杀菌剂结合的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于证据推理算法的建筑用能行为理论模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

异构信息空间中时间感知的个性化语义实体搜索关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Wiki资源的中英文跨语言本体知识库构建

国家自然科学基金

1+阅读 · 2012年12月31日

定性地理信息检索的模型与方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数空间与度量测度空间上的分析

国家自然科学基金

0+阅读 · 2012年12月31日

"An Adapt-or-Die Type of Situation": Perception, Adoption, and Use of Text-To-Image-Generation AI by Game Industry Professionals

Arxiv

0+阅读 · 2023年6月5日

Fast and high-order approximation of parabolic equations using hierarchical direct solvers and implicit Runge-Kutta methods

Arxiv

0+阅读 · 2023年6月5日

Evaluating Regular Path Queries in GQL and SQL/PGQ: How Far Can The Classical Algorithms Take Us?

Arxiv

0+阅读 · 2023年6月3日

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Arxiv

0+阅读 · 2023年6月2日

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Arxiv

0+阅读 · 2023年6月1日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Approaches for Enriching and Improving Textual Knowledge Bases

Arxiv

15+阅读 · 2018年4月20日

VIP会员

文章信息

相关主题

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

129+阅读 · 2023年1月29日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

251+阅读 · 2022年8月31日

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

专知会员服务

120+阅读 · 2022年3月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

【Manning2022新书】Python与PySpark的数据分析，458页pdf

【Manning2022新书】Python与PySpark的数据分析，458页pdf

专知

17+阅读 · 2022年3月21日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

干货 | Python 爬虫的工具列表大全

干货 | Python 爬虫的工具列表大全

机器学习算法与Python学习

10+阅读 · 2018年4月13日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

"An Adapt-or-Die Type of Situation": Perception, Adoption, and Use of Text-To-Image-Generation AI by Game Industry Professionals

Arxiv

0+阅读 · 2023年6月5日

Fast and high-order approximation of parabolic equations using hierarchical direct solvers and implicit Runge-Kutta methods

Arxiv

0+阅读 · 2023年6月5日

Evaluating Regular Path Queries in GQL and SQL/PGQ: How Far Can The Classical Algorithms Take Us?

Arxiv

0+阅读 · 2023年6月3日

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Arxiv

0+阅读 · 2023年6月2日

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Arxiv

0+阅读 · 2023年6月1日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Approaches for Enriching and Improving Textual Knowledge Bases

Arxiv

15+阅读 · 2018年4月20日

相关基金

大豆miRNA重复基因与靶基因的共进化分析

国家自然科学基金

0+阅读 · 2013年12月31日

禾谷镰孢菌Fusarium graminearum CYP51与DMIs类杀菌剂结合的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于证据推理算法的建筑用能行为理论模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

异构信息空间中时间感知的个性化语义实体搜索关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Wiki资源的中英文跨语言本体知识库构建

国家自然科学基金

1+阅读 · 2012年12月31日

定性地理信息检索的模型与方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数空间与度量测度空间上的分析

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员