DocuMCP Architecture Overview
This document explains the architectural design of DocuMCP, providing insight into how the system works and why key design decisions were made.
High-Level Architectureβ
DocuMCP follows a modular, stateless architecture built on the Model Context Protocol (MCP) standard, designed to provide intelligent documentation deployment capabilities through AI assistant integration.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Assistant Layer β
β (Claude, GPT, Gemini, etc.) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β MCP Protocol
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β DocuMCP MCP Server β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β Tools β β Prompts β β Resources β β
β β Layer β β Layer β β Layer β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β Core Engine Layer β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β Repository β β SSG Recommendationβ β Memory System β β
β β Analysis β β Engine β β β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β Content β β Deployment β β Validation β β
β β Generation β β Automation β β Engine β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β External Integrations Layer β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β GitHub β β Static β β File System β β
β β API β β Site β β Operations β β
β β β β Generators β β β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Design Principlesβ
1. Stateless Operationβ
DocuMCP operates as a stateless service where each tool invocation is independent:
- No persistent server state between requests
- Self-contained analysis for each repository
- Reproducible results given the same inputs
- Horizontal scalability without coordination
Benefits:
- Reliability and consistency
- Easy debugging and testing
- No complex state management
- Simple deployment model
2. Modular Architectureβ
Each component has a single, well-defined responsibility:
// Tool interface definition
interface MCPTool {
name: string;
description: string;
inputSchema: ZodSchema;
handler: (args: ToolArgs) => Promise<ToolResult>;
}
// Example tool implementation
export async function analyzeRepository(
args: AnalysisArgs,
): Promise<AnalysisResult> {
// Isolated business logic
return performAnalysis(args);
}
Benefits:
- Easy to test and maintain
- Clear separation of concerns
- Extensible without breaking changes
- Independent component evolution
3. Progressive Complexityβ
Users can start simple and add sophistication as needed:
- Basic: Simple repository analysis
- Intermediate: SSG recommendations and configuration
- Advanced: Full deployment automation with optimization
- Expert: Memory-enhanced workflows with pattern learning
4. Security-First Designβ
All operations follow security best practices:
- Minimal permissions in generated workflows
- OIDC authentication for GitHub Actions
- Input validation using Zod schemas
- No secret exposure in logs or outputs
Component Architectureβ
MCP Server Coreβ
The main server (src/index.ts) implements the MCP protocol specification using the low-level Server class from @modelcontextprotocol/sdk/server/index.js:
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
// ... other schemas
} from "@modelcontextprotocol/sdk/types.js";
const server = new Server(
{
name: "documcp",
version: packageJson.version,
},
{
capabilities: {
tools: {}, // 30+ documentation tools
prompts: {
listChanged: true, // Guided workflow prompts
},
resources: {
subscribe: true, // Generated content resources
listChanged: true,
},
roots: {
listChanged: true, // Path permission management
},
},
},
);
// Tool registration using request handlers
server.setRequestHandler(ListToolsRequestSchema, async () => {
return { tools: TOOLS.map(/* transform to MCP format */) };
});
server.setRequestHandler(CallToolRequestSchema, async (request) => {
// Route to appropriate tool handler based on request.params.name
});
// Connect via stdio transport for process-based communication
const transport = new StdioServerTransport();
await server.connect(transport);
Key Features:
- Low-Level Server Implementation: Uses the foundational
Serverclass for maximum control and flexibility - Stdio Transport: Communicates via standard input/output streams for process-based integration with MCP clients
- Manual Tool Registration: Tools registered using
setRequestHandlerwithCallToolRequestSchemaandListToolsRequestSchema - Schema Validation: Zod-based input/output validation with
zodToJsonSchemaconversion for MCP compatibility - Resource Management: Automatic resource creation and storage with URI-based access patterns
- Error Handling: Comprehensive error management with structured MCP error responses
- Path Security: Root-based permission checking via
--rootarguments to restrict file system access
Repository Analysis Engineβ
The analysis engine examines projects from multiple perspectives:
interface RepositoryAnalysis {
structure: ProjectStructure; // Files, languages, organization
dependencies: DependencyAnalysis; // Package ecosystems, frameworks
documentation: DocuAnalysis; // Existing docs, quality assessment
recommendations: ProjectProfile; // Type, complexity, team size
}
Analysis Layers:
- File System Analysis: Language detection, structure mapping
- Dependency Analysis: Package manager integration, framework detection
- Documentation Assessment: README quality, existing docs evaluation
- Complexity Scoring: Project size, team collaboration patterns
Performance Characteristics:
- Sub-second analysis for typical repositories
- Memory efficient with streaming file processing
- Extensible language and framework detection
SSG Recommendation Engineβ
A data-driven system for selecting optimal static site generators:
interface SSGRecommendation {
recommended: SSGType;
confidence: number; // 0-1 confidence score
reasoning: string[]; // Human-readable justifications
alternatives: Alternative[]; // Other viable options
scoring: ScoringBreakdown; // Detailed scoring matrix
}
Scoring Factors:
- Ecosystem Alignment: Language/framework compatibility
- Feature Requirements: Search, theming, plugins
- Complexity Match: Project size and team capacity
- Performance Needs: Build speed, site performance
- Maintenance Overhead: Learning curve, ongoing effort
Supported SSGs:
- Jekyll: Ruby-based, GitHub Pages native
- Hugo: Go-based, fast builds, extensive themes
- Docusaurus: React-based, modern features
- MkDocs: Python-based, simple and effective
- Eleventy: JavaScript-based, flexible and fast
Memory System Architectureβ
An intelligent learning system that improves recommendations over time:
interface MemorySystem {
storage: ProjectLocalStorage; // .documcp/memory/
patterns: PatternRecognition; // Success pattern learning
similarity: ProjectSimilarity; // Project comparison engine
insights: HistoricalInsights; // Usage patterns and outcomes
}
Storage Architecture:
.documcp/memory/
βββ analysis/ # Repository analysis results
β βββ analysis_*.jsonl
β βββ metadata.json
βββ recommendations/ # SSG recommendations
β βββ recommendations_*.jsonl
β βββ patterns.json
βββ deployments/ # Deployment outcomes
β βββ deployments_*.jsonl
β βββ success_rates.json
βββ system/ # System metadata
βββ config.json
βββ statistics.json
Learning Mechanisms:
- Pattern Recognition: Successful project-SSG combinations
- Similarity Matching: Find projects with similar characteristics
- Outcome Tracking: Monitor deployment success rates
- Feedback Integration: Learn from user choices and outcomes
Content Generation Systemβ
Automated content creation following the Diataxis framework:
Diataxis Framework Implementation:
βββββββββββββββββββ βββββββββββββββββββ
β Tutorials β β How-to Guides β
β (Learning) β β (Problem- β
β β β solving) β
βββββββββββββββββββ βββββββββββββββββββ
βββββββββββββββββββ βββββββββββββββββββ
β Reference β β Explanation β
β (Information) β β (Understanding) β
βββββββββββββββββββ βββββββββββββββββββ
Content Types Generated (Diataxis Framework):
- Tutorials: Learning-oriented guides for skill acquisition (study context)
- How-to Guides: Problem-solving guides for specific tasks (work context)
- Reference: Information-oriented content for lookup and verification (information context)
- Explanation: Understanding-oriented content for context and background (understanding context)
Deployment Automationβ
Automated GitHub Pages deployment with SSG-specific optimizations:
# Generated workflow characteristics
Security:
- OIDC authentication (no long-lived tokens)
- Minimal permissions (contents:read, pages:write)
- Secret masking and secure handling
Performance:
- Dependency caching (npm, gems, pip)
- Parallel builds where possible
- Optimized Docker images
Reliability:
- Build verification before deployment
- Rollback capabilities
- Health monitoring
Validation Engineβ
Multi-layered validation for content quality assurance:
interface ValidationEngine {
linkChecker: LinkValidation; // Internal/external link verification
contentValidator: ContentValidation; // Diataxis compliance, accuracy
codeValidator: CodeValidation; // Syntax checking, example testing
seoValidator: SEOValidation; // Meta tags, performance, accessibility
}
Validation Levels:
- Syntax: Markdown, code block syntax
- Structure: Diataxis compliance, navigation
- Content: Accuracy, completeness, consistency
- Performance: Loading speed, mobile optimization
- SEO: Meta tags, structured data, accessibility
Data Flow Architectureβ
Request Processing Flowβ
1. MCP Client Request
β
2. Schema Validation (Zod)
β
3. Tool Handler Routing
β
4. Business Logic Execution
β
5. Result Processing
β
6. Resource Storage
β
7. Response Formatting
β
8. MCP Protocol Response
Memory System Flowβ
1. Tool Execution
β
2. Result Analysis
β
3. Pattern Extraction
β
4. Similarity Matching
β
5. Storage Update
β
6. Learning Integration
β
7. Future Recommendations Enhancement