数据分析是指用适当的统计方法对收集来的大量第一手资料和第二手资料进行分析,以求最大化地开发数据资料的功能,发挥数据的作用。

VIP内容

这本书涵盖了用R总结数据的基本探索性技术。这些技术通常在正式建模开始之前应用,可以帮助开发更复杂的统计模型。探索技术对于消除或强化关于世界的潜在假设也很重要,这些假设可以通过你所拥有的数据来解决。我们将详细介绍R中的绘图系统以及构造信息数据图形的一些基本原则。我们还将介绍一些用于可视化高维数据的常见多元统计技术。

这本书教你使用R来有效地可视化和探索复杂的数据集。探索性数据分析是数据科学过程的一个关键部分,因为它允许您尖锐地提出问题并改进建模策略。这本书是基于行业领先的约翰霍普金斯数据科学专业,最广泛订阅的数据科学培训项目创建。

成为VIP会员查看完整内容
0
24

最新论文

One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate this problem with the help of multiple supplementary resource documents assisting the task. We present a new dataset MiRANews and benchmark existing summarization models. In contrast to multi-document summarization, which addresses multiple events from several source documents, we still aim at generating a summary for a single document. We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles. An error analysis of generated summaries from pretrained models fine-tuned on MiRANews reveals that this has an even bigger effects on models: assisted summarization reduces 55% of hallucinations when compared to single-document summarization models trained on the main article only. Our code and data are available at https://github.com/XinnuoXu/MiRANews.

0
0
下载
预览
Top