When linear regression generates a relationship between a (dependent) scalar response and one or multiple independent variables, various datasets providing distinct graphical trends can develop resembling relationships based on the same statistical properties. Advanced statistical approaches, such as neural networks and machine learning methods, are of great necessity to process, characterize, and analyze these degenerate datasets. On the other hand, the accurate creation of purposedly degenerate datasets is essential to test new models in the research and education of applied statistics. In this light, the present study characterizes the famous Anscombe datasets and provides a general algorithm for creating multiple paired datasets of identical statistical properties.
翻译:当线性回归产生(依赖的)星标响应与一个或多个独立变量之间的关系时,提供不同图形趋势的各种数据集可以在相同的统计属性基础上发展相似的关系。先进的统计方法,如神经网络和机器学习方法,对于处理、定性和分析这些退化的数据集非常必要。另一方面,准确创建有意退化的数据集对于测试应用统计数据研究和教育中的新模型至关重要。根据这一点,本研究对著名的Anscombe数据集进行了特征描述,并为创建相同统计属性的多个配对数据集提供了一般算法。