重要的事说三遍:“阿里云RSS开源啦!” X 3 git地址: https://github.com/alibaba/RemoteShuffleService 开源代码包含核心功能及容错,满足生产要求。 计划中的重要Feature:
AE
Spark多版本支持
Better 流控
Better 监控
Better HA
多引擎支持
欢迎各路开发者共建!
六 Reference
[1]Min Shen, Ye Zhou, Chandni Singh. Magnet: Push-based Shuffle Service for Large-scale Data Processing. VLDB 2020.[2]Haoyu Zhang, Brian Cho, Ergin Seyfe, Avery Ching, Michael J. Freedman. Riffle: Optimized Shuffle Service for Large-Scale Data Analytics. EuroSys 2018.[3]Sriram Rao, Raghu Ramakrishnan, Adam Silberstein. Sailfish: A Framework For Large Scale Data Processing. SoCC 2012.[4]KFS. http://code.google.com/p/kosmosfs/[5]Google Dataflow Shuffle. https://cloud.google.com/blog/products/data-analytics/how-distributed-shuffle-improves-scalability-and-performance-cloud-dataflow-pipelines[6]Cosco: An Efficient Facebook-Scale Shuffle Service. https://databricks.com/session/cosco-an-efficient-facebook-scale-shuffle-service[7]Flash for Apache Spark Shuffle with Cosco. https://databricks.com/session_na20/flash-for-apache-spark-shuffle-with-cosco[8]Uber Zeus. https://databricks.com/session_na20/zeus-ubers-highly-scalable-and-distributed-shuffle-as-a-service[9]Uber Zeus. https://github.com/uber/RemoteShuffleService[10]Intel RPMP. https://databricks.com/session_na20/accelerating-apache-spark-shuffle-for-data-analytics-on-the-cloud-with-remote-persistent-memory-pools[11]Tencent FireStorm. https://github.com/Tencent/Firestorm[12]Aliyun RSS在趣头条的实践. https://developer.aliyun.com/article/779686[13]Aliyun RSS架构. https://developer.aliyun.com/article/772329