Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。

VIP内容

这本书提供 访问Spark平台的真实文档和示例,以构建大型企业级机器学习应用程序。

在过去的十年里,机器学习取得了一系列惊人的进步。这些突破正在影响我们的日常生活,并对每个行业产生影响。下一代机器学习Spark提供了Spark和Spark MLlib的介绍,并在标准Spark MLlib库之外,向更强大的第三方机器学习算法和库迈进。在这本书的结尾,你将能够通过许多实际的例子和有洞察力的解释将你的知识应用到现实世界的用例中

  • 介绍机器学习、Spark和Spark MLlib 2.4.x
  • 使用XGBoost4J Spark和LightGBM库在Spark上实现闪电般的快速渐变增强
  • 用Spark的隔离林算法检测异常
  • 使用支持多种语言的Spark NLP和Stanford CoreNLP库
  • 使用Alluxio内存数据加速器for Spark优化ML工作负载
  • 使用GraphX和GraphFrames进行图形分析
  • 利用卷积神经网络进行图像识别
  • 利用Keras框架和Spark分布式深度学习库

这本书是给谁的

数据科学家和机器学习工程师,他们希望将自己的知识提升到一个新的水平,使用Spark和更强大的下一代算法和库,而不是标准Spark MLlib库中提供的;同时也是有抱负的数据科学家和工程师的入门书,他们需要机器学习的入门知识,Spark,SparkMLlib。

成为VIP会员查看完整内容
0
82

最新论文

Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. In this paper, we focus on the less-studied setting of multi-query video retrieval, where multiple queries are provided to the model for searching over the video archive. We first show that the multi-query retrieval task is more pragmatic and representative of real-world use cases and better evaluates retrieval capabilities of current models, thereby deserving of further investigation alongside the more prevalent single-query retrieval setup. We then propose several new methods for leveraging multiple queries at training time to improve over simply combining similarity outputs of multiple queries from regular single-query trained models. Our models consistently outperform several competitive baselines over three different datasets. For instance, Recall@1 can be improved by 4.7 points on MSR-VTT, 4.1 points on MSVD and 11.7 points on VATEX over a strong baseline built on the state-of-the-art CLIP4Clip model. We believe further modeling efforts will bring new insights to this direction and spark new systems that perform better in real-world video retrieval applications. Code is available at https://github.com/princetonvisualai/MQVR.

0
0
下载
预览
Top
微信扫码咨询专知VIP会员