预测性和解释性模型可能错失教育数据的信息特点 (Predictive and explanatory models might miss informative features in educational data) - 专知论文

会员服务 ·

0

INFORMS · 可约的 · MINE · CASES · 假阴性 ·

2021 年 10 月 28 日

Predictive and explanatory models might miss informative features in educational data

翻译：预测性和解释性模型可能错失教育数据的信息特点

Nicholas T. Young,Marcos D. Caballero

from arxiv, 46 pages, 15 figures, 7 tables

We encounter variables with little variation often in educational data mining (EDM) due to the demographics of higher education and the questions we ask. Yet, little work has examined how to analyze such data. Therefore, we conducted a simulation study using logistic regression, penalized regression, and random forest. We systematically varied the fraction of positive outcomes, feature imbalances, and odds ratios. We find the algorithms treat features with the same odds ratios differently based on the features' imbalance and the outcome imbalance. While none of the algorithms fully solved how to handle imbalanced data, penalized approaches such as Firth and Log-F reduced the difference between the built-in odds ratio and value determined by the algorithm. Our results suggest that EDM studies might contain false negatives when determining which variables are related to an outcome. We then apply our findings to a graduate admissions data set. We end by proposing recommendations that researchers should consider penalized regression for data sets on the order of hundreds of cases and should include more context about their data in publications such as the outcome and feature imbalances.

翻译：由于高等教育的人口统计和我们提出的问题,我们在教育数据挖掘(EDM)中经常遇到变化很少的变量。然而,几乎没有研究如何分析这些数据。因此,我们利用后勤回归、抑制回归和随机森林进行了模拟研究。我们系统地区分了正结果的分数、特征失衡和差数比率。我们发现算法根据特征的不平衡和结果不平衡,对相同差数比率的特征处理不同。虽然没有一个算法完全解决了如何处理不平衡数据的问题,但Firth和Log-F等惩罚性方法减少了内在误差比率和算法确定的价值之间的差别。我们的结果表明,EDM研究在确定与结果有关的变量时可能含有虚假的负差。我们然后将研究结果应用于研究生入学数据集。我们最后提出建议,研究人员应考虑根据数百个案例的顺序对数据集进行惩罚性回归,并在出版物中包括结果和特征失衡等关于其数据的更多背景。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【干货书】贝叶斯统计分析方法，697页pdf

【干货书】贝叶斯统计分析方法，697页pdf

专知会员服务

124+阅读 · 2021年12月18日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书】管理统计和数据科学原理，678页pdf

【干货书】管理统计和数据科学原理，678页pdf

专知会员服务

186+阅读 · 2020年7月29日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Omitted Variable Bias in Machine Learned Causal Models

Arxiv

0+阅读 · 2021年12月29日

Improving Prediction of Cognitive Performance using Deep Neural Networks in Sparse Data

Arxiv

0+阅读 · 2021年12月28日

Retrieval & Interaction Machine for Tabular Data Prediction

Arxiv

5+阅读 · 2021年8月11日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

Arxiv

13+阅读 · 2019年11月1日

Learning From Positive and Unlabeled Data: A Survey

Learning From Positive and Unlabeled Data: A Survey

Arxiv

5+阅读 · 2018年11月12日

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

Arxiv

3+阅读 · 2018年7月30日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Arxiv

3+阅读 · 2018年6月13日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

From Data Fusion to Knowledge Fusion

Arxiv

5+阅读 · 2015年3月1日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】贝叶斯统计分析方法，697页pdf

【干货书】贝叶斯统计分析方法，697页pdf

专知会员服务

124+阅读 · 2021年12月18日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书】管理统计和数据科学原理，678页pdf

【干货书】管理统计和数据科学原理，678页pdf

专知会员服务

186+阅读 · 2020年7月29日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

《军事行动中的人机协同共同学习》2025最新文献

代理式人工智能时代的决策优势

《F/A-18机队替换中队仿真模型的设计与分析》2025最新73页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Omitted Variable Bias in Machine Learned Causal Models

Arxiv

0+阅读 · 2021年12月29日

Improving Prediction of Cognitive Performance using Deep Neural Networks in Sparse Data

Arxiv

0+阅读 · 2021年12月28日

Retrieval & Interaction Machine for Tabular Data Prediction

Arxiv

5+阅读 · 2021年8月11日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

Arxiv

13+阅读 · 2019年11月1日

Learning From Positive and Unlabeled Data: A Survey

Learning From Positive and Unlabeled Data: A Survey

Arxiv

5+阅读 · 2018年11月12日

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation

Arxiv

3+阅读 · 2018年7月30日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Arxiv

3+阅读 · 2018年6月13日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

From Data Fusion to Knowledge Fusion

Arxiv

5+阅读 · 2015年3月1日

微信扫码咨询专知VIP会员