Developers spend nearly 58% of their time understanding codebases, yet maintaining comprehensive documentation remains challenging due to complexity and manual effort. While recent Large Language Models (LLMs) show promise for function-level documentation, they fail at the repository level, where capturing architectural patterns and cross-module interactions is essential. We introduce CodeWiki, the first open-source framework for holistic repository-level documentation across seven programming languages. CodeWiki employs three innovations: (i) hierarchical decomposition that preserves architectural context, (ii) recursive agentic processing with dynamic delegation, and (iii) synthesis of textual and visual artifacts including architecture diagrams and data flows. We also present CodeWikiBench, the first repository-level documentation benchmark with multi-level rubrics and agentic assessment. CodeWiki achieves 68.79% quality score with proprietary models and 64.80% with open-source alternatives, outperforming existing closed-source systems and demonstrating scalable, accurate documentation for real-world repositories.
翻译:暂无翻译