预测医疗费用的开放医疗数据建模 (Building predictive models of healthcare costs with open healthcare data) - 专知论文

会员服务 ·

0

决策树 · 产生模型 · 稀疏回归 · 透明度 · 统计学 ·

2023 年 4 月 5 日

Building predictive models of healthcare costs with open healthcare data

翻译：预测医疗费用的开放医疗数据建模

A. Ravishankar Rao,Subrata Garai,Soumyabrata Dey,Hang Peng

from arxiv, 2020 IEEE International Conference on Healthcare Informatics (ICHI)

Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems.

翻译：由于全球医疗费用的迅速上涨，控制医疗费用已成为一个非常重要的问题。重要的一点是价格透明度，由此产生了初步的努力，证明患者将寻找低价，从而推动效率。这需要数据得到公开并产生模型来预测广泛范围的患者和条件下的医疗费用。我们通过使用机器学习技术开发一种预测模型来解决这个问题。我们分析了纽约州的SPARCS (statewide planning and research cooperative system)的去识别化患者数据，包括2016年230万条记录。我们建立了预测从患者诊断和人口统计学特征开始的医疗费用的模型。我们研究了两个模型类，分别为稀疏回归和决策树。通过使用深度为10的决策树，我们获得了最佳性能。我们获得了R平方值为0.76，该值比类似问题的文献报告的值更好。

0

相关内容

决策树

决策树(Decision Tree）是在已知各种情况发生概率的基础上，通过构成决策树来求取净现值的期望值大于等于零的概率，评价项目风险，判断其可行性的决策分析方法，是直观运用概率分析的一种图解法。由于这种决策分支画成图形很像一棵树的枝干，故称决策树。在机器学习中，决策树是一个预测模型，他代表的是对象属性与对象值之间的一种映射关系。Entropy = 系统的凌乱程度，使用算法ID3, C4.5和C5.0生成树算法使用熵。这一度量是基于信息学理论中熵的概念。决策树是一种树形结构，其中每个内部节点表示一个属性上的测试，每个分支代表一个测试输出，每个叶节点代表一种类别。分类树（决策树）是一种十分常用的分类方法。他是一种监管学习，所谓监管学习就是给定一堆样本，每个样本都有一组属性和一个类别，这些类别是事先确定的，那么通过学习得到一个分类器，这个分类器能够对新出现的对象给出正确的分类。这样的机器学习就被称之为监督学习。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

【用强化学习转变医疗保健服务白皮书】Transforming healthcare with Reinforcement Learning

【用强化学习转变医疗保健服务白皮书】Transforming healthcare with Reinforcement Learning

专知会员服务

14+阅读 · 2022年2月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

开源新书《时间序列分析，数据/方法/应用》，6章110页pdf带你了解最新进展，附下载

开源新书《时间序列分析，数据/方法/应用》，6章110页pdf带你了解最新进展，附下载

专知会员服务

203+阅读 · 2019年11月20日

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

专知会员服务

14+阅读 · 2019年11月19日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

一起看 I/O | 用 Health Connect 连通应用间的健康数据

一起看 I/O | 用 Health Connect 连通应用间的健康数据

谷歌开发者

1+阅读 · 2022年5月24日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于健康数据分析的半监督在线学习血糖预报建模算法研究

国家自然科学基金

2+阅读 · 2015年12月31日

排序集抽样下随机删失数据的非参数估计

国家自然科学基金

1+阅读 · 2014年12月31日

FY-3微波数据RFI订正及我国典型地区地表微波发射率反演研究

国家自然科学基金

1+阅读 · 2013年12月31日

社会长期护理保险：支出预测和政策建议

国家自然科学基金

0+阅读 · 2012年12月31日

手性磷铝分子筛负载Ni-P催化剂催化蒎烯不对称加氢反应研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于线性时不变系统的家庭冠心病仿真预测模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

城镇居民亚健康状态的评价方法学及健康管理模式研究

国家自然科学基金

0+阅读 · 2011年12月31日

临床医生信息需求研究与“一键通”系统：一种基于临床现场的智能“临床决策支持”系统的研究和应用

国家自然科学基金

4+阅读 · 2011年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

Understanding and Improving Optimization in Predictive Coding Networks

Arxiv

0+阅读 · 2023年5月23日

Information-theoretic analyses of neural data to minimize the effect of researchers' assumptions in predictive coding studies

Arxiv

0+阅读 · 2023年5月22日

GraphCare: Enhancing Healthcare Predictions with Open-World Personalized Knowledge Graphs

Arxiv

0+阅读 · 2023年5月22日

Mist: Towards Improved Adversarial Examples for Diffusion Models

Arxiv

0+阅读 · 2023年5月22日

Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining

Arxiv

0+阅读 · 2023年5月21日

Exploring the Viability of Synthetic Query Generation for Relevance Prediction

Arxiv

0+阅读 · 2023年5月19日

Towards Generalizable Data Protection With Transferable Unlearnable Examples

Arxiv

0+阅读 · 2023年5月18日

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps

Arxiv

30+阅读 · 2023年5月12日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

【用强化学习转变医疗保健服务白皮书】Transforming healthcare with Reinforcement Learning

【用强化学习转变医疗保健服务白皮书】Transforming healthcare with Reinforcement Learning

专知会员服务

14+阅读 · 2022年2月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

开源新书《时间序列分析，数据/方法/应用》，6章110页pdf带你了解最新进展，附下载

开源新书《时间序列分析，数据/方法/应用》，6章110页pdf带你了解最新进展，附下载

专知会员服务

203+阅读 · 2019年11月20日

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

专知会员服务

14+阅读 · 2019年11月19日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

一起看 I/O | 用 Health Connect 连通应用间的健康数据

一起看 I/O | 用 Health Connect 连通应用间的健康数据

谷歌开发者

1+阅读 · 2022年5月24日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Understanding and Improving Optimization in Predictive Coding Networks

Arxiv

0+阅读 · 2023年5月23日

Information-theoretic analyses of neural data to minimize the effect of researchers' assumptions in predictive coding studies

Arxiv

0+阅读 · 2023年5月22日

GraphCare: Enhancing Healthcare Predictions with Open-World Personalized Knowledge Graphs

Arxiv

0+阅读 · 2023年5月22日

Mist: Towards Improved Adversarial Examples for Diffusion Models

Arxiv

0+阅读 · 2023年5月22日

Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining

Arxiv

0+阅读 · 2023年5月21日

Exploring the Viability of Synthetic Query Generation for Relevance Prediction

Arxiv

0+阅读 · 2023年5月19日

Towards Generalizable Data Protection With Transferable Unlearnable Examples

Arxiv

0+阅读 · 2023年5月18日

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps

Arxiv

30+阅读 · 2023年5月12日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

相关基金

基于健康数据分析的半监督在线学习血糖预报建模算法研究

国家自然科学基金

2+阅读 · 2015年12月31日

排序集抽样下随机删失数据的非参数估计

国家自然科学基金

1+阅读 · 2014年12月31日

FY-3微波数据RFI订正及我国典型地区地表微波发射率反演研究

国家自然科学基金

1+阅读 · 2013年12月31日

社会长期护理保险：支出预测和政策建议

国家自然科学基金

0+阅读 · 2012年12月31日

手性磷铝分子筛负载Ni-P催化剂催化蒎烯不对称加氢反应研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于线性时不变系统的家庭冠心病仿真预测模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

城镇居民亚健康状态的评价方法学及健康管理模式研究

国家自然科学基金

0+阅读 · 2011年12月31日

临床医生信息需求研究与“一键通”系统：一种基于临床现场的智能“临床决策支持”系统的研究和应用

国家自然科学基金

4+阅读 · 2011年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员