Monte Carlo simulations are the primary methodology for evaluating Item Response Theory (IRT) methods, yet marginal reliability - the fundamental metric of data informativeness - is rarely treated as an explicit design factor. Unlike in multilevel modeling where the intraclass correlation (ICC) is routinely manipulated, IRT studies typically treat reliability as an incidental outcome, creating a "reliability omission" that obscures the signal-to-noise ratio of generated data. To address this gap, we introduce a principled framework for reliability-targeted simulation, transforming reliability from an implicit by-product into a precise input parameter. We formalize the inverse design problem, solving for a global discrimination scaling factor that uniquely achieves a pre-specified target reliability. Two complementary algorithms are proposed: Empirical Quadrature Calibration (EQC) for rapid, deterministic precision, and Stochastic Approximation Calibration (SAC) for rigorous stochastic estimation. A comprehensive validation study across 960 conditions demonstrates that EQC achieves essentially exact calibration, while SAC remains unbiased across non-normal latent distributions and empirical item pools. Furthermore, we clarify the theoretical distinction between average-information and error-variance-based reliability metrics, showing they require different calibration scales due to Jensen's inequality. An accompanying open-source R package, IRTsimrel, enables researchers to standardize reliability as a controlled experimental input.
翻译:蒙特卡洛模拟是评估项目反应理论(IRT)方法的主要手段,然而边际可靠性——数据信息量的基本度量——却很少被作为明确的设计因素处理。与多水平建模中常规操纵组内相关系数(ICC)不同,IRT研究通常将可靠性视为附带结果,造成一种“可靠性遗漏”,从而模糊了生成数据的信噪比。为填补这一空白,我们提出了一个面向可靠性的模拟原则框架,将可靠性从隐含的副产品转变为精确的输入参数。我们形式化了逆向设计问题,通过求解全局区分度缩放因子,唯一地实现预先指定的目标可靠性。提出了两种互补算法:用于快速、确定性精确校准的经验求积校准(EQC),以及用于严格随机估计的随机逼近校准(SAC)。一项涵盖960种条件的综合验证研究表明,EQC实现了基本精确的校准,而SAC在非正态潜在分布和实证项目池中保持无偏性。此外,我们澄清了基于平均信息与基于误差方差的可靠性度量之间的理论区别,表明由于詹森不等式,它们需要不同的校准尺度。随附的开源R包IRTsimrel使研究人员能够将可靠性标准化为受控的实验输入。