-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Large codebase guide #8932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Large codebase guide #8932
Conversation
| --- | ||
| # Set up large complex codebases for AI pair programming | ||
|
|
||
| This guide shows you how to scale context engineering for large, complex codebases with thousands of files, multiple teams, and intricate architectural dependencies. Building on the [context engineering guide](/docs/copilot/guides/context-engineering-guide.md), this covers advanced strategies for managing AI context in brownfield codebases where traditional "vibe coding" approaches break down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks on the terminology and whether people will know them:
- "brownfield" is a term that I only learnt at Microsoft, and always gives me weird imagery. Would "existing" codebases be just as descriptive? Or do most people know that term?
- It always feels a bit funny to say "traditional" vibe coding, as it's too young to have traditions, and there's so much disagreement about what it means.
- "Context engineering" is a hot term right now for AI app developers, but I don't know if it will make immediate sense to developers who haven't built AI agents/RAG systems yet. I really like the title of the article as I think that makes it super clear what you're describing, just have some concerns about whether people will be thrown off by not connecting "context engineering" to software development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Afaik brownfield is a common term in DevOps spaces; but we can make sure it is explained where its used.
- Agreed, can be reworded to highlight the scale in control and oversight
- We have another guide that would explain it; but could need an aside for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also below, not sure if brownfield codebases should be part of this guide. They have their own particular problem space, which is broader than just the size of it.
I'd create a separate guide for dealing with existing / legacy codebases. That guide would then likely reference the context engineering and this large codebases guide.
|
|
||
| ### Context inheritance patterns | ||
|
|
||
| VS Code combines all applicable instruction files automatically: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it concatenated in this order? That was a question I got from developers, and I ended up going into Chat Debug View to try to answer their question. If we can be clear about the order, developers will appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not, there is no priority order or callout in the instructions; and each model has slightly different best practices (like mention important things on top of the system prompt and again in the end).
Once make it clearer as we instruct the model on what takes precedence (like AGENTS.md in root vs in sub-folders vs chat mode, etc); and share that with users.
| VS Code combines all applicable instruction files automatically: | ||
|
|
||
| - **Repository-wide** (`.github/copilot-instructions.md`): Core architecture, shared conventions | ||
| - **Team-level** (`team-*.instructions.md`): Tech stack specifics, team workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is VS Code actually looking at whether the filename has "team" versus "module" in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout; they are madeup namespaces as its a very flexible system.
| Guidelines for implementing logging, monitoring, and observability features... | ||
| ``` | ||
|
|
||
| The AI agent can automatically load this instruction file when it detects the conversation involves logging, monitoring, or observability concepts.### Context scoping strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline missing?
|
|
||
| - `.github/copilot-instructions.md` for repository-wide context | ||
| - Focused `.instructions.md` files with specific `applyTo` patterns for subsystems | ||
| - Reference documentation using Markdown links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People always ask if Markdown links are auto-concatenated, I think it'd help to clarify somewhere that links are NOT auto-concatenated, but Copilot will be able to fetch when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In some file like custom-instructions.md they are auto-included; but for agent mode we usually aim for tool-driven file reads
|
|
||
| ### Role-based chat modes | ||
|
|
||
| Different roles need different AI personas and tool access. Create [custom chat modes](/docs/copilot/customization/custom-chat-modes.md) with specific instructions and tool sets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "roles" here, I'm not sure if you're referring to job roles of engineers, or if you mean that you're giving GitHub Copilot a job role. Might help to clarify? "Each member of your team can use a custom chat mode that personalizes the persona of GitHub Copilot and grants specific toolsets. For example, a frontend developer can use a "frontend" mode with access to tools like Playwright and Figma"."
Dont know if thats clearer
|
|
||
| <!-- TODO: Add complete example chat mode files with tool configurations for each role --> | ||
|
|
||
| ### Team-specific workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This heading sounds kind of like role-based workflows, given there's usually a "frontend" team.
|
|
||
| ### Team-specific workflows | ||
|
|
||
| Create chat modes that encode team workflows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like "You can also create chat modes that are specific to common project phases and team workflows. For example"
|
|
||
| These files can be referenced in your chat mode instructions to provide persistent context. | ||
|
|
||
| <!-- TODO: Add workflow for when and how to update memory files during development --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add prompt for how to create them, and instructions that keep them updated.
|
|
||
| In large codebases, context windows fill up quickly. Implement systematic context compaction: | ||
|
|
||
| #### Memory file patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to remove that; its less large codebase related.
| - **Performance mode**: Application + infrastructure optimization | ||
| - **Migration mode**: Coordinated changes across multiple services | ||
|
|
||
| ## Context compaction techniques |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a less technical term for describing these techniques?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Context optimization techniques"?
|
|
||
| ``` | ||
| docs/ai/ | ||
| current-tasks.md # Active work and blockers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting as most people use task trackers for this. I make this only locally in a branch for TODOs on the current branch, and then delete it before PRing, once TODOs are complete. May want to clarify? People may be surprised to suggest storing tasks in the repo.
| current-tasks.md # Active work and blockers | ||
| architectural-decisions.md # Key design choices and rationale | ||
| integration-patterns.md # How services communicate | ||
| common-pitfalls.md # Frequent mistakes and solutions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you reference these from copilot-instructions.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah or would you reference them from the chat modes that you mention after this section?
|
|
||
| ### Documentation and knowledge preservation | ||
|
|
||
| Use AI to document tribal knowledge before it's lost: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Use AI to document tribal knowledge before it's lost: | |
| Use AI to document institutional knowledge before it's lost: |
| 3. Specific module patterns | ||
| 4. Implementation details | ||
|
|
||
| ## Brownfield codebase strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate guide? Brownfield does not necessarily equate to a large codebase and the problem-space for existing codebases is broader than the size of the codebase.
| - **Domain expert modes**: Capture and share specialized knowledge | ||
| - **Cross-team modes**: Facilitate knowledge sharing between teams | ||
|
|
||
| ## Context window optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be grouped with the section on context compacting? They're all about optimizing the context size.
|
|
||
| <!-- TODO: Add step-by-step guide for organizing existing documentation into instruction files --> | ||
|
|
||
| ### Context inheritance patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section feels overlapping with the previous one. Should they be combined? The main topics seem to be:
- Multiple instructions files per topic/team/module/concept
- How they are applied
| --- | ||
| ``` | ||
|
|
||
| ## Advanced chat mode patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "Chat mode patterns"?
Suggest adding a quick intro sentence about what chat modes are and why they're relevant for large codebases.
| - **Migration mode**: Coordinated changes across multiple services | ||
|
|
||
| ## Context compaction techniques | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a quick intro about the problems related to context when working with a large codespace.
|
|
||
| #### Chat mode delegation | ||
|
|
||
| Use specialized chat modes for context-heavy tasks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using specialized chat modes, how does this compact the context? Or does the user have to do something extra?
Outline