PySpark论文 - 专知

会员服务 ·

PySpark

Enhancing Real-Time Master Data Management with Complex Match and Merge Algorithms

Arxiv

0+阅读 · 2024年10月8日

NLP-Guided Synthesis: Transitioning from Sequential Programs to Distributed Programs

Arxiv

0+阅读 · 2024年10月10日

GraphWeaver: Billion-Scale Cybersecurity Incident Correlation

Arxiv

0+阅读 · 2024年6月3日

Analyzing Political Figures in Real-Time: Leveraging YouTube Metadata for Sentiment Analysis

Arxiv

0+阅读 · 2023年9月28日

Scalable Econometrics on Big Data -- The Logistic Regression on Spark

Arxiv

0+阅读 · 2021年6月18日

Distributed Tera-Scale Similarity Search with MPI: Provably Efficient Similarity Search over billions without a Single Distance Computation

Arxiv

0+阅读 · 2020年8月17日

Tera-SLASH: A Distributed Energy-Efficient MPI based LSH System for Tera-Scale Similarity Search

Arxiv

0+阅读 · 2020年8月5日

Potential customer mining application of smart home products based on LightGBM PU learning and Spark ML algorithm practice

Arxiv

0+阅读 · 2020年6月22日

Rumble: Data Independence for Large Messy Data Sets

Arxiv

0+阅读 · 2020年5月6日

Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times

Arxiv

0+阅读 · 2019年11月28日

Rumble: data independence when data is in a mess

Rumble: data independence when data is in a mess

Arxiv

0+阅读 · 2019年10月25日

Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times

Arxiv

0+阅读 · 2019年10月3日

One DSL to Rule Them All: IDE-Assisted Code Generation for Agile Data Analysis

One DSL to Rule Them All: IDE-Assisted Code Generation for Agile Data Analysis

Arxiv

0+阅读 · 2019年4月18日

Serverless Data Analytics with Flint

Arxiv

0+阅读 · 2018年10月10日

Exploiting Apache Spark platform for CMS computing analytics

Arxiv

0+阅读 · 2017年11月1日

参考链接

微信扫码咨询专知VIP会员