Multi-Agent Design System Extraction: When Two LLMs Are Better Than One

I just watched two LLMs work together to extract a production-quality design system from my three React apps in a single session. The results were so clean I had to understand what made this workflow so effective.

Here’s the breakdown of a multi-agent approach that might change how you think about complex technical projects.

The Challenge

My Lab tools—Finance Tracker, Thai Flashcards, and MindWave visualizer—had evolved organically. Each solved real problems I had, but they shared zero UI patterns. Buttons, forms, cards, loading states—all custom implementations scattered across components.

I needed a unified design system, but I didn’t want to spend weeks analyzing code patterns and planning extraction strategies.

The Multi-Agent Solution

Instead of one LLM trying to do everything, we split the work:

Agent 1 (Planning): Deep codebase analysis with a surgical system prompt Human (Orchestration): Two-minute validation cycle Agent 2 (Execution): Systematic implementation with progress tracking

The System Prompt That Made It Work

The planning agent wasn’t just told “create a design system plan.” It operated under strict constraints:

- ZERO theoretical changes - only document required modifications
- ZERO assumptions about implementation - actively verify all dependencies
- ALWAYS search files actively - never defer change verification
- CONCERN-BASED DECOMPOSITION REQUIRED: For large functions/classes,
  identify distinct responsibilities and specify extraction strategy

This surgical precision forced the planning agent to examine actual code patterns rather than make assumptions.

What the Planning Agent Delivered

The resulting 414-line analysis document was remarkably concrete:

Exact extraction targets:

- Primary Action Buttons: Main CTA buttons (submit, save, etc.)
  - Code ranges: PaymentForm.tsx:198-204, StudySession.tsx:254-259
  - Extract to: variant="default"

Dependency ordering:

Prerequisites (Must be implemented first):
1. Core Utilities Setup
2. Base Components (Can be implemented in parallel)
3. Enhanced Components (Depend on base components)

Variable dependency mapping:

- onClick handlers from original implementations
- disabled states from form validation contexts
- className overrides for specific styling needs

No hand-waving. No “we’ll figure it out during implementation.” Every component had specific code ranges to extract and clear interfaces to implement.

Cognitive Division of Labor

This approach solved the classic LLM problem: trying to analyze and implement simultaneously often leads to superficial execution.

Planning Agent’s cognitive load:

Deep code pattern analysis
Dependency graph traversal
Component architecture decisions
Implementation feasibility verification

Execution Agent’s cognitive load:

Following established specifications
Making tactical interface decisions
Handling implementation edge cases
Systematic progress tracking

The human acted as quality gate, validating that the plan was comprehensive before expensive implementation work began.

Why This Worked So Well

1. Constraint-driven planning prevented theoretical solutions The system prompt forced verification of every assumption. No “we should extract buttons” without identifying exactly which buttons and where they lived.

2. Concrete specifications eliminated ambiguity Line numbers, file names, exact extraction targets. The execution agent never had to guess what “improve button consistency” meant.

3. Systematic progress tracking maintained momentum TodoWrite tool tracked 12 complex tasks from core utilities through full migration. Each completed task built confidence for the next.

4. Real-world validation through actual usage This wasn’t an academic exercise—these apps run live on my consulting site. The design system had to work for real users.

The Business Impact

The technical success was obvious—unified components, reduced duplication, maintainable patterns. But the business impact was deeper.

My Lab tools now demonstrate systematic technical thinking to potential consulting clients. When prospects see the Finance Tracker, Thai Flashcards, and MindWave visualizer working smoothly with consistent UI patterns, they’re seeing proof that I can help them avoid the “every change breaks something else” problem described in my hero copy.

The design system project became both solution and demonstration of the systematic approach I teach clients.

What This Reveals About AI Workflows

Multi-agent approaches excel when cognitive tasks can be cleanly separated. Analysis and implementation require different types of thinking. Splitting them across specialized agents produced better results than any single LLM attempting both.

Constraint-heavy system prompts create better specifications. The surgical prompt prevented the planning agent from taking shortcuts or deferring complexity to implementation time.

Human orchestration scales with validation, not execution. My effort scaled with plan quality, not implementation complexity. The two-minute approval cycle was possible because the specifications were immediately actionable.

Real-world constraints force better architecture. Working on production code with actual users created quality pressure that theoretical exercises lack.

The Template for Complex Technical Projects

This workflow pattern could transform how complex technical work gets done:

Constrained planning agent - Force concrete specifications, no theoretical solutions
Human validation gate - Quick approval of comprehensive specifications
Systematic execution agent - Methodical implementation with progress tracking
Real-world validation - Deploy and verify with actual usage

The key insight: excellent context and systematic division of cognitive labor can make AI workflows far more effective than trying to do everything in one massive prompt.

What’s Next

I’m already thinking about applying this pattern to other complex projects. API refactoring, database migrations, testing strategy implementation—any technical work that benefits from deep analysis followed by systematic execution.

The question isn’t whether AI can handle complex technical projects. It’s whether we’re organizing AI workflows to take advantage of what different types of models do best.

And judging by the clean, functioning design system now running in production on my site, the multi-agent approach is worth serious attention.