Civilizations have tried to make drinking water safe to consume for thousands of years. The process of determining water contaminants has evolved with the complexity of the contaminants due to pesticides and heavy metals. The routine procedure to determine water safety is to use targeted analysis which searches for specific substances from some known list; however, we do not explicitly know which substances should be on this list. Before experimentally determining which substances are contaminants, how do we answer the sampling problem of identifying all the substances in the water? Here, we present an approach that builds on the work of Jaanus Liigand et al., which used non-targeted analysis that conducts a broader search on the sample to develop a random-forest regression model, to predict the names of all the substances in a sample, as well as their respective concentrations[1]. This work utilizes techniques from dimensionality reduction and linear decompositions to present a more accurate model using data from the European Massbank Metabolome Library to produce a global list of chemicals that researchers can then identify and test for when purifying water.
翻译:数千年来,文明一直试图使饮用水安全消费。确定水污染物的过程随着杀虫剂和重金属污染的复杂程度而演变。确定水安全的例行程序是使用有针对性的分析,从已知的清单上搜索特定物质;然而,我们并不明确知道哪些物质应该列入这一清单。在实验确定哪些物质是污染物之前,我们如何解决确定水中所有物质的抽样问题?在这里,我们提出了一个方法,它以Jaanus Liigard等人的工作为基础,使用非目标分析,对样本进行更广泛的搜索,以开发随机森林回归模型,预测样本中所有物质的名称及其各自的浓度[1]。这项工作利用从维度减少和线性分解法的技术,利用欧洲大银行代谢图书馆的数据,提出更准确的模型,以编制全球化学品清单,供研究人员在净化水时加以识别和测试。