Computational social science lacks a scalable and reliable mechanism to assure quality for AI-assisted qualitative coding when tasks demand domain expertise and long-text reasoning, and traditional double-coding is prohibitively costly at scale. We develop and validate a dual-signal quality assessment framework that combines model confidence with inter-model consensus (external entropy) and evaluate it across legal reasoning (390 Supreme Court cases), political analysis (645 hyperpartisan articles), and medical classification (1,000 clinical transcripts). External entropy is consistently negatively associated with accuracy (r = -0.179 to -0.273, p < 0.001), while confidence is positively associated in two domains (r = 0.104 to 0.429). Weight optimization improves over single-signal baselines by 6.6-113.7% and transfers across domains (100% success), and an intelligent triage protocol reduces manual verification effort by 44.6% while maintaining quality. The framework offers a principled, domain-agnostic quality assurance mechanism that scales qualitative coding without extensive double-coding, provides actionable guidance for sampling and verification, and enables larger and more diverse corpora to be analyzed with maintained rigor.
翻译:暂无翻译