In the era of big data, the increasing availability of diverse data sources has driven interest in analytical approaches that integrate information across sources to enhance statistical accuracy, efficiency, and scientific insights. Many existing methods assume exchangeability among data sources and often implicitly require that sources measure identical covariates or outcomes, or that the error distribution is correctly specified-assumptions that may not hold in complex real-world scenarios. This paper explores the integration of data from sources with distinct outcome scales, focusing on leveraging external data to improve statistical efficiency. Specifically, we consider a scenario where the primary dataset includes a continuous outcome, and external data provides a dichotomized version of the same outcome. We propose two novel estimators: the first estimator remains asymptotically consistent even when the error distribution is potentially misspecified, while the second estimator guarantees an efficiency gain over weighted least squares estimation that uses the primary study data alone. Theoretical properties of these estimators are rigorously derived, and extensive simulation studies are conducted to highlight their robustness and efficiency gains across various scenarios. Finally, a real-world application using the NHANES dataset demonstrates the practical utility of the proposed methods.
翻译:暂无翻译