Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. The first criteria aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct one-to-one logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. We tested the performance of the proposed criteria (i) through a simulation study, and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) are natural extensions from previously proposed criteria for binary outcomes. We illustrate how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.
翻译:多重后勤回归模型可以预测2类以上绝对结果的风险。在开发这一模型时,研究人员应确保参与者人数(n)相对于事件数量(E.k)和每个类别预测参数(p.k)的数量(p.k)是适当的。我们建议了三项标准,以根据为二进制结果制定的现有标准确定所需的最低nn;第一项标准旨在尽可能减少模型的超配;第二项标准旨在尽可能缩小观察到的和调整的R2 Nagelkerke之间的差别。第三项标准旨在确保准确估计总体风险。关于标准(i),我们表明样本规模必须基于预期的Cox-snell R2, 与多进制后勤回归的子模型相对的不同的Cox-snell R2。我们提出了三项标准,而不是根据总体的Cox-snell R2 物流回归标准确定最低n。我们通过模拟研究测试了拟议标准的绩效(i),并发现它导致理想的过度匹配水平。关于标准(i),Criterial(ii)和(iii)样本规模规模的模型必须基于预期的Coxal-reval 标准,从我们从一个拟议的标准到一个通过一个测试的自然扩展标准,将如何展示一个通过一个测试到一个通过一个测试结果。