Development Plan v0.1

AI Knowledge
Sync System

A strategy to keep AI contextually aware of code changes across large monorepos and multi-repo environments โ€” using AST parsing, commit hooks, structured KNOWLEDGE.md, and MCP serving.

4
Development Phases
~10wk
Estimated Timeline
3
Core Layers
โˆž
Repos Supported
Phased Roadmap
Phase 01
Foundation โ€” Single Repo Proof of Concept
Wk 1โ€“2
Start with one repo. Build the AST parser pipeline that extracts module maps, public APIs, and exports. Generate a static KNOWLEDGE.md on demand. Validate that Claude can answer questions about the codebase using this file alone.
AST parser (JS/TS/Python)
KNOWLEDGE.md schema design
manual generation script
Claude context test
TRD/PRD link format
Phase 02
Automation โ€” Commit-triggered Updates
Wk 3โ€“5
Hook into CI/CD (GitHub Actions, GitLab CI, etc.). On each push, detect changed files, re-parse only those modules, diff against previous KNOWLEDGE.md, and do an incremental LLM-assisted update. Avoid full re-reads โ€” only process what changed.
git diff detector
incremental AST re-parse
CI/CD hook integration
LLM diff summarizer
KNOWLEDGE.md versioning
changelog generation
Phase 03
Scale โ€” Multi-Repo + Cross-Repo Index
Wk 6โ€“8
Extend to N repos using a manifest file. Build a GLOBAL_KNOWLEDGE.md that aggregates cross-repo API contracts, shared interfaces, and inter-service dependencies. Enable impact analysis โ€” "if I change X in repo-a, what breaks in repo-b?"
repos.yaml manifest
global aggregator
interface contract tracker
cross-repo dep graph
breaking change detection
metadata tagging (repo/module/date)
Phase 04
Intelligence โ€” MCP Server + Live Query
Wk 9โ€“10
Build the MCP server layer that lets Claude query knowledge on-demand rather than relying on static context injection. Support queries by repo, module, date range, or change type. Optionally connect to Notion/Confluence for TRD/PRD enrichment.
MCP server scaffold
query API design
repo federation
date/module filters
Notion/Confluence connector
semantic search (optional)
Component Recommended Multi-Repo Incremental Notes
AST Parser tree-sitter (multi-lang) YES YES Supports JS, TS, Python, Go, etc.
Knowledge Format Structured Markdown (KNOWLEDGE.md) YES YES Human-readable + AI-injectable
CI Hook GitHub Actions / GitLab CI YES YES Per-repo, triggers on push
LLM Summarizer Claude Haiku (fast + cheap) YES YES Only process changed files
Serving Layer MCP Server (custom) YES PARTIAL Phase 4 โ€” query on demand
Vector Search pgvector / Chroma (optional) MAYBE MAYBE Only if scale demands it
Doc enrichment Notion / Confluence API YES MANUAL Link TRD/PRD to modules
Risks & Mitigations
โ— HIGH RISK
Context window overflow
Large repos with many modules will generate massive KNOWLEDGE.md files that exceed context limits. Mitigation: use MCP for on-demand fetching instead of full injection. Keep per-module summaries short.
โ— HIGH RISK
Knowledge staleness in fast-moving repos
If CI is slow or skipped, knowledge drifts from reality. Mitigation: add a "freshness" timestamp to each module entry and warn Claude when data is older than X hours.
โ— MEDIUM RISK
Multi-language AST complexity
Supporting JS, Python, Go, Java etc. multiplies parser complexity. Mitigation: start with your dominant language only. Use tree-sitter for multi-lang once validated.
โ— MEDIUM RISK
Cross-repo dependency drift
Shared interfaces change in one repo but knowledge in another repo doesn't update. Mitigation: build a central manifest and trigger downstream knowledge updates on interface changes.
โ— LOW RISK
LLM summarization cost
Using LLM on every commit could get expensive. Mitigation: use Haiku for summaries, only process changed files/modules, and debounce rapid commits into batches.
โ— LOW RISK
Developer adoption
Devs may ignore or overwrite KNOWLEDGE.md. Mitigation: make it fully auto-generated, read-only, and visually useful (e.g. rendered in GitHub as the module's live doc).

โ–ธ Where To Start Right Now

  1. Pick ONE repo, ONE language. Design the KNOWLEDGE.md schema โ€” decide what sections it needs (module map, API surface, recent changes, TRD links).
  2. Write the AST parser script locally. Run it against the repo and inspect output quality before automating anything.
  3. Test the output with Claude directly โ€” paste the KNOWLEDGE.md and ask questions about the codebase. Validate usefulness before building infrastructure.
  4. Only after validation: wire up the CI hook to auto-regenerate on commit. Incremental updates come after full generation works reliably.
  5. Plan multi-repo expansion only after single-repo is stable. Define your repos.yaml manifest format and cross-repo interface contracts.