Analyses of large observational datasets tend to be complicated and prone to fault depending upon the variable selection, data cleaning and analytic methods employed. Here, we discuss analyses of 2016 US environmental epidemiology data and outline some potential implications of our Non-parametric and Unsupervised Learning perspective. Readers are invited to download the CSV files we have contributed to dryad and apply the analytic approaches they think appropriate. We hope to encourage development of a broad-based "consensus view" of the potential effects of Secondary Organic Aerosols (Volatile Organic Compounds that have predominantly Biogenic or Anthropogenic origin) within PM2:5 particulate matter on Circulatory and/or Respiratory mortality. The analyses described here ultimately focus on the question: "Is life in a region with relatively high air-borne Biogenic particulate matter also relatively dangerous in terms of Circulatory and/or Respiratory mortality?"
翻译:大型观测数据集的分析往往比较复杂,容易有误,取决于采用的不同选择、数据清理和分析方法。在这里,我们讨论对2016年美国环境流行病学数据的分析,并概述我们非参数和不受监督的学习观点的一些潜在影响。请读者下载我们为干燥而贡献的CSV文件,并采用他们认为适当的分析方法。我们希望鼓励就二类有机气溶胶(主要源自生物源或人类源的挥发性有机化合物)在PM2:PM2:5颗粒物质对循环和/或呼吸系统死亡率的潜在影响,形成一种基础广泛的“一致观点 ” 。这里介绍的分析最终侧重于以下问题:“在空气中含有相对较高的生物源颗粒物质的地区,在循环和/或呼吸系统死亡率方面,生命是否也相对危险?”