Context Engineering & Management
Prompting gets you started. Context engineering keeps you going.
Module Overview
Module 4 is about keeping AI output good across long sessions, not just the first message. Week 2 is where context pressure gets serious — loading a full BRD, PRD, and reference architecture into one session to produce the Architecture Document and Dev Tasks. You learn what to load and when, how to recognize and recover a degraded or poisoned session, and how to use subagents to keep the main context clean.
| At a glance | |
|---|---|
| Covers | Relevant-over-abundant context loading; continue vs. compact vs. start-fresh; context poisoning and recovery; subagents as context isolation; Architecture / ADR / Dev Tasks quality bars |
| When it runs | Week 2 Monday (in-person) |
| Builds on | Module 3 (prompting) and the approved Week 1 BRD + PRD |
| Leads into | Week 2 production of the Architecture Document, ADRs, and Dev Tasks |
What you'll produce
Across Week 2: an approved Architecture Document, Architecture Decision Records (one per epic), and Dev Tasks (1–4-hour atomic tasks traceable to PRD user stories) — signed off at the Architecture Review + Dev Handoff gate.
The Discipline Behind the Prompt
A great prompt produces great output in message 1. Context engineering produces great output in message 50. Week 2 is where context pressure gets serious — Architects run multi-hour sessions loading BRD, PRD, reference architectures, and ADR drafts at once; Tech Leads generate Dev Tasks across the full PRD and Architecture Document. These are the heaviest context loads in the training.
Builds on Module 3
Module 3 taught you to write a strong prompt. Module 4 keeps that quality alive across a long session. If output degrades mid-session, the first question is context (this module); the second is prompt craft (Module 3). Keep both handbooks open during Week 2.
The cycle applied to the session itself
Evaluate: assess session health — is quality degrading? Is the window filling? Is history still relevant?
Plan: decide the strategy — continue, compact, or start fresh.
Apply: execute it — load the right context, compact and continue, or open a fresh session.
Validate: confirm output quality recovered. If not, reassess and apply again.
Part 1 — What Is Context
Every session runs inside a context window — fixed-size working memory holding your messages, the AI’s responses, loaded files, and the system prompt (including AGENTS.md). When it fills up, older content gets pushed out or compressed. The AI doesn’t announce this — it just starts producing worse output.
The context window is a whiteboard, not a hard drive
When the whiteboard fills, you erase the oldest notes to make room. The AI works with whatever’s still on the board. Your job: keep the most relevant information on the board at all times. Irrelevant context is as bad as missing context — it crowds out what matters.
What Fills the Window
| Source | Notes |
|---|---|
| System prompt / project rules | AGENTS.md and tool-specific rules — loaded automatically, counts toward the window |
| Files you load | BRD, PRD, Architecture, code, ADR drafts — every file adds to the count |
| Conversation history | Every message and response — grows with each exchange |
| Tool outputs | Filesystem reads, GitHub searches, web results — each one adds context |
| The AI’s own reasoning | Chain-of-thought adds up fast in long sessions |
Token Economics by Tool
Context is measured in tokens — roughly 0.75 words per token (1,000 tokens ≈ 750 words). Each model has a maximum window.
| Tool | Approx. window | Good for | Watch out for |
|---|---|---|---|
| Claude Code (Opus 4.7 / Sonnet 4.6) | 200K standard; 1M on Max / Team / Enterprise | Multi-file dev, long coding, agentic tasks | Auto-compacts near the limit — watch for quality drops before that point |
| OpenCode (model-dependent) | Set by the chosen model — e.g. Big Pickle 200K, or a 1M+ provider | Same dev/agentic work as Claude Code; pick the model to fit the job | Window is the model’s, not the tool’s — Big Pickle degrades well before 200K (~50–70K) |
| Gemini (Workspace, Gemini 2.5 Pro) | 1M+ tokens | Long-context BRD/PRD/Architecture analysis, multi-doc synthesis | No auto-compact — manage manually; watch for drift in long threads |
| Cursor (GPT-4o / Sonnet 4.6) | 128K–200K by model | Code-focused, file-grounded sessions | Smaller window — be deliberate about which files are open |
Part 2 — Loading Context Deliberately
Loading is a design decision. Before any significant session, answer three questions: what does the AI need to know, what’s the minimum that covers it, and in what order should it load?
Relevant beats abundant
Load the specific sections, not the whole document. Working on auth architecture? Load the auth-related BRD requirements and PRD stories — not the full 50-page BRD.
Load in dependency order: upstream artifacts first (BRD before PRD, PRD before Architecture). The AI builds understanding layer by layer.
State what you loaded and why: “I’m loading X, Y, Z to work on [task].” Orients the AI and gives you a checkpoint.
Reload when sessions resume. The AI remembers nothing from yesterday.
What Each Role Loads in Week 2
| Role | Load | Don’t over-load |
|---|---|---|
| Solutions Architect (Architecture) | Approved BRD + PRD first; add reference architectures and ADR drafts as needed | Don’t pre-load everything — load per section |
| Tech Lead (Dev Tasks) | Approved PRD + Architecture; for each story, only that story’s section | Not the full PRD at once — one story at a time |
| Solutions Designer (review) | BRD + PRD + early Architecture draft | Tight scope — checking BRD–Architecture consistency |
| Product Mgr / Designer (review) | PRD + Architecture draft | Checking UI/UX survives into architecture |
| Project Manager (coordination) | PRD + quality gates doc | No code or architecture detail needed |
| Developers (review) | Architecture + Dev Tasks draft | Checking buildability from the task alone |
| QA / DevOps (review) | PRD/Architecture with testability / deployability lens | Scoped to the lens, not the whole repo |
Part 3 — The Session Lifecycle
Every session has a lifecycle: clean window → work → window fills → quality degrades. How you manage it decides whether you get good output for 2 hours or 20 minutes. Three strategies, one decision at each inflection point.
| Strategy | What it does | Use when |
|---|---|---|
| Continue | Keep the current session going | Task is clearly related, context still relevant, quality still high, under ~40% of the window |
| Compact | Compress history, preserve key findings | One major sub-task done, starting another; session valuable but getting long |
| Start fresh | End the session; reload only what the next task needs | Corrected the AI 3+ times on the same mistake; output vague/contradictory; switching artifacts |
Manual Compact Pattern (Gemini / OpenCode)
Claude Code compacts with /compact. In Gemini — and in OpenCode when you want a clean handoff between sessions — do it manually:
1. At the end of a section, ask the AI:
"Summarize the decisions made so far and their rationale,
plus any open questions. Max 10 bullets."
2. Copy that summary.
3. Open a new session.
4. Paste the summary as the first message.
5. Continue the work from there.
Decision Guide
Ask these three questions in order:
1. Is output quality still high? No → start fresh
2. Is the next task closely related? No → start fresh
3. Is the session history still relevant?
Yes → continue
Partially → compact
When in doubt: start fresh. Cost = 5 minutes to reload.
Cost of pushing through a degraded session = a bad
artifact at Friday’s gate.
Part 4 — Context Poisoning
Poisoning is a specific failure: accumulated history contains contradictions or conflicting instructions the AI can’t resolve cleanly. The output is inconsistent, hedging, or quietly wrong — different from length-based degradation.
How It Happens
| Cause | Example |
|---|---|
| Flip-flopping on decisions | “Use Redis for sessions” (msg 5), then “no sessions — use JWTs” (msg 15). Both are now in context. |
| Partial corrections | You correct the validation approach; the AI reverts next message because the original is still in context. |
| Cumulative scope drift | Starts as “generate Dev Tasks for US-05,” drifts to “redesign auth” by message 30. |
| Contradictory loaded files | A BRD that says “no sessions” loaded alongside a PRD draft that assumes session auth. |
Signs of a Poisoned Context
- The AI contradicts something it said 10 messages ago without acknowledging it
- Output quality varies wildly between adjacent messages
- The AI hedges: “depending on your earlier decision about X” when X was settled long ago
- You feel like you’re arguing with the AI rather than collaborating
- After a correction, the AI reverts to the wrong approach next message
Main exercise: the poisoned-context transcript
Below is a short, made-up session that starts useful and goes poisoned. Read it, then answer the questions underneath. The same pattern shows up in real Week 2 architecture sessions.
[1] USER: Designing session storage for the LMS API. ~1,000
concurrent users, 8-hour sessions. What are my options?
[2] AI: Three options: (a) stateless JWTs, (b) server-side
sessions in Redis, (c) DB-backed sessions. For your scale,
stateless JWTs are simplest — no extra infra.
[3] USER: Good. Let's go stateless JWT, no server state.
[4] AI: Agreed. JWT with an 8-hour expiry, signed server-side,
validated on each request. No session store needed.
[5] USER: Now add the ability to force-logout a user instantly
when HR deactivates their account.
[6] AI: Sure — we'll keep a Redis set of revoked sessions and
check it on every request, so we can revoke immediately.
[7] USER: Wait, I thought we were stateless? Now there's Redis?
[8] AI: You're right, let's stay stateless. We can use short
5-minute JWTs and a refresh token stored in Redis to allow
revocation. That keeps it stateless while supporting logout.
Diagnose it
Identify the exact message where the contradiction is injected (the point the design stops being internally consistent).
Name the contradiction: what two things is the AI now trying to satisfy at once?
Explain why the message [7] correction didn’t fix it — why does [8] still carry the conflict forward?
Write the recovery plan: what exactly would you load into a fresh session, and what would you state up front, to get a clean answer?
There is no in-session recovery
Notice that [8] doesn’t resolve the conflict — it tries to honor both “stateless” and “instant revocation via a store,” and ends up describing a stateful design while calling it stateless. Accumulated contradictions don’t un-contradict themselves. The only fix is a fresh session: load only the artifacts reflecting the current correct decision, state that decision explicitly at the start, and don’t carry the conflicting history forward.
Part 5 — Subagents as Context Isolation
Module 3 framed agents as a reusability layer. Module 4 adds a second dimension: an agent runs with its own context, separate from the main session. It gets only what its task needs, returns its output, and the main session’s context stays clean.
Why isolation matters
You’re 30 messages into an architecture session and want a quick code review. Paste the file into the main session and you add review noise to an architecture context — both tasks suffer. Instead, invoke the code-reviewer agent: it runs in isolation with only the file and AGENTS.md, returns findings, and your architecture session continues unaffected.
| Tool | Isolation mechanism | Limitation |
|---|---|---|
| Claude Code | Subagents in .claude/agents/; auto-delegated by description; isolated window | Auto-delegation needs a clear description in frontmatter |
| OpenCode | Built-in subagents (General, Explore, Scout) plus custom ones in .opencode/agents/; auto-invoked by description or @-mentioned; each runs its own context | Auto-invocation needs a clear description; subagent inherits the parent’s model unless overridden |
| Gemini (Workspace) | Open a new chat session for the isolated task | Entirely manual; easy to forget to start fresh |
| Cursor | New Composer session or custom mode, scoped to the relevant files | Smaller window; file scoping is critical |
When to isolate — and when not to
Isolate when: the sub-task is clearly scoped and unrelated; you want a second opinion not influenced by the session’s decisions; the sub-task is a different artifact type.
Don’t isolate when: the sub-task needs the main session’s decisions — instead, carry only the relevant decisions as a brief summary. The goal is clean, focused context, not fragmented context.
Part 6 — Week 2 Quality Bars
Module 2 brought the BRD and PRD bars. Module 4 brings this week’s three: Architecture Document, ADRs, and Dev Tasks. Same pattern — self-check first, peer review mid-week, gate review Friday.
Where the artifacts live — and where they don’t
Artifacts sit in a local docs/ folder, organized by type. They are NOT committed to the code repository — keep docs/ out of the repo (add it to .gitignore). The code repo holds code; the artifact set is delivery documentation that lives alongside it, not inside it.
ADRs and Dev Tasks are segregated by epic — not single monolithic files. Typical layout:
docs/brd/ — BRD-<project>-v1.0.md
docs/prd/ — PRD-<project>-v1.0.md
docs/arch-docs/ — ARCH-<project>-v1.0.md + one ADR per epic (ADR-<project>-E001…)
docs/dev-tasks/ — one CSV per epic (epic-0-...-tasks.csv, epic-1-...-tasks.csv)
docs/generated-figma/ — design output
Architecture Document
Translates the approved PRD into a technical design. Answers “how do we build this?”
Checklist
- Every PRD user story traces to at least one architectural component
- Every technology choice is justified inline or has an ADR
- Non-functional requirements from the PRD are addressed explicitly, not assumed
- The deployment topology is specific enough for DevOps to begin IaC
- Security decisions are documented (auth, secrets management, access control)
- No BRD constraint is violated (timeline, budget, data residency, tech lock-ins)
- Open questions are explicitly listed, not hidden in narrative
- The document is internally consistent — no section contradicts another
Common failure modes
Component names that map to no PRD user story (scope creep via architecture).
Technology choices with no rationale or ADR pointer.
NFRs in the PRD not addressed in the architecture (“1000 concurrent users” — how?).
Deployment section left vague (“will deploy to AWS” without specifics).
Internal contradictions (component stateless in one section, stateful in another).
Architecture Decision Records (ADRs, one per epic)
Document significant technical decisions — options, trade-offs, rationale. They make decisions explicable when the team or requirements change. ADRs are segregated by epic: one ADR file per epic under docs/arch-docs/ (e.g. ADR-<project>-E001-Authentication-Access-Control.md), sitting alongside the main Architecture Document.
Checklist
- At least 2 options documented with concrete pros and cons — not strawmen
- Rationale references specific BRD or PRD constraints that drove the decision
- Consequences are honest — both positive and negative
- The decision is traceable — the Architecture Document references the ADR
- Status is set correctly (accepted, not still “proposed” if final)
Common failure modes
Only the chosen option documented — the ADR becomes post-hoc justification.
Rationale says “industry best practice” or “team preference” without specifics.
Consequences missing or only positive.
ADR exists but isn’t referenced from the Architecture Document.
Dev Tasks (CSV, segregated by epic)
Breaks the PRD and Architecture into developer-sized tasks — atomic, traceable, estimated in hours. This is what developers pick up in Week 3. Generated by the task-planner agent (https://ultimate-ai-agents.stratpoint.io/browser). Dev Tasks are segregated by epic: one CSV per epic under docs/dev-tasks/ (e.g. epic-0-project-setup-tasks.csv, epic-1-auth-access-control-tasks.csv). Each CSV opens with an Epic row (issue_type: Epic, estimate 0), then its tasks.
Checklist
- Every task has a unique ID, PRD reference, and acceptance criteria
- Every task is atomic and hour-estimated — oversized tasks are broken down further
- Testing tasks (unit, integration) are developer-owned — role is Backend/Frontend/Fullstack, not QA
- Tasks are segregated by epic — one CSV per epic, each opening with an Epic row
- Every PRD user story is covered by at least one task (no orphans)
- Dependencies are explicit — a developer can self-sequence from the document alone
- No task requires a decision not already made in Architecture or ADRs
- Acceptance criteria are testable — done/not-done is unambiguous
- Ordered logically — foundation before features, auth before protected routes
Common failure modes
Oversized, non-atomic tasks (“implement the full auth system” — should be several tasks).
Missing PRD reference — can’t trace back to a user story.
Acceptance criteria copied verbatim from the PRD without implementation refinement.
Implicit dependencies (PATCH before the GET it depends on).
References to decisions not yet made (“use the caching strategy TBD in ADR-03”).
Part 7 — Producing Your Week 2 Artifact
The Evaluate → Plan → Apply → Validate cycle, with context discipline baked in. Sequences below are for the Architecture Document (Solutions Architect) and Dev Tasks (Tech Lead). Reviewers produce notes — see the role table.
Using the agents — set up a Project or Gem (no CLI needed)
On web chatbots you don’t invoke an agent from the browser — you create a persistent container that holds its system prompt, then converse with that:
• Claude.ai: create a Project, paste the agent into the Project’s custom instructions (source: agents/claude-projects/).
• Gemini: Gems → Create, paste the agent into the System Instruction field (source: agents/gemini/).
• ChatGPT: create a Custom GPT, paste into its Instructions (source: agents/chatgpt/).
The agent browser (index.html) is just the copy source — browse, pick your platform, copy. The Project / Gem / Custom GPT is what actually runs the prompt, and it persists, so you set it up once and reuse it all week.
CLI (Claude Code): install once, then invoke by name with @task-planner or @solution-architect. Same agent, same output — only the mechanism differs.
Lean on Module 3
These are still prompts. When a section comes back thin or drifts, apply the Module 3 craft — structure, chain of thought for decisions, least-to-most for long documents — before assuming the inputs are at fault.
How to read these sequences
The colored boxes are prompt text — the words you send the AI. The plain text above each sequence (“Before you start”) is setup you do yourself: creating the Project/Gem and loading files. Do the setup first, then send the boxes one at a time, reading the output before sending the next. Don’t paste the setup steps into the chat.
Architecture Document + ADRs (Solutions Architect)
Before you start (setup — do this yourself, don’t paste it)
1. Set up the agent. Load solution-architect as your system prompt — a Claude.ai Project or Gemini Gem with the agent pasted in (web), or @solution-architect (Claude Code CLI). See the setup callout above.
2. Load context (Gemini handles the long context well). Approved BRD + PRD, plus the reference Architecture Document in docs/arch-docs/ as your quality bar.
3. Then send the prompts below, one at a time.
Send — Evaluate
[EVALUATE]
Before drafting anything, analyze the loaded BRD and PRD
and list:
1. Every PRD user story that needs an architectural component
2. The epics, and the significant decision each one will
need captured as an ADR
3. The NFRs the architecture must satisfy
4. Any BRD constraints the architecture must not violate
Don’t write the document yet — just the analysis.
Read the analysis. If it missed an epic or NFR, your context was thin — fix the load before continuing.
Send — Plan
[PLAN]
Plan the Architecture Document structure and the ADR list
before writing. Confirm:
- Every PRD story will map to a component
- Each epic gets one ADR with ≥2 real options
- NFRs get explicit treatment, not assumptions
Don’t write the document yet — plan only.
Review the plan against the spec. If the session is getting long, compact before drafting: ask for a summary of decisions so far, then start a fresh session with that summary.
Send — Apply
[APPLY]
Plan approved. Draft the Architecture Document section by
section. For each epic, also draft one ADR with: title,
status, context, ≥2 options with pros/cons, decision,
rationale (cite the BRD/PRD), and honest consequences.
Name the ADRs per epic: ADR-<project>-E001, E002, …
Work one section per turn — don’t dump the whole document
at once.
Save the document and ADRs into docs/arch-docs/. If you need an unrelated sub-task mid-session (e.g. a quick code review), run it in a separate session / subagent so this one’s context stays clean.
Send — Validate
[VALIDATE]
Check this draft against the Architecture Document and ADR
checklists. For each item, answer PASS or FAIL with the
specific gap. Then run the traceability check: every PRD
story → a component, every tech choice → a rationale or ADR.
List every FAIL.
Fix every FAIL yourself (or with follow-up prompts) before declaring ready for peer review.
Dev Tasks (Tech Lead)
Before you start (setup — do this yourself, don’t paste it)
1. Set up the agent. Load task-planner as your system prompt — a Claude.ai Project or Gemini Gem with the agent pasted in (web), or @task-planner (Claude Code CLI).
2. Load context — one epic at a time. Only that epic’s PRD stories and the relevant Architecture/ADR section, plus an existing per-epic CSV as the schema reference. This is the key context move: don’t load the full PRD at once.
3. Then send the prompts below, one at a time.
Send — Evaluate
[EVALUATE]
For the loaded epic, before breaking anything down, list:
1. The user stories in this epic that need tasks
2. The Architecture components / ADRs each will rely on
3. Any decision still open that would block a task
Don’t generate tasks yet — just the analysis.
If step 3 surfaces an open decision, resolve it (or note it) before generating tasks — an unresolved decision becomes a blocked task in Week 3.
Send — Plan
[PLAN]
Plan this epic’s task breakdown before generating the CSV:
- An Epic row (issue_type: Epic), then the atomic tasks
- The dependencies between them
- Which Architecture component / ADR each task relies on
Confirm no task needs a decision not yet made.
Review the breakdown. Are the tasks atomic? Are dependencies in a buildable order? Push back here, before the CSV is generated.
Send — Apply
[APPLY]
Plan approved. Generate this epic’s Dev Tasks as CSV rows
matching the schema: task_id, issue_type, summary,
user_story, description, acceptance_criteria, priority,
estimate, story_points, role, assignee, status, labels,
epic_link, dependencies.
Lead with the Epic row, then its tasks.
Save as docs/dev-tasks/epic-N-<slug>-tasks.csv. Then start a fresh session for the next epic — don’t accumulate every epic in one context.
Send — Validate
[VALIDATE]
Check every generated row against the Dev Tasks checklist:
- Each task atomic + hour-estimated, with a unique ID and
PRD + Architecture references?
- Every user story in this epic covered? Dependencies
explicit?
- Any task referencing a decision not yet made?
Then trace: every task → user story → BRD requirement.
List every FAIL.
Fix every FAIL before peer review. Run this same check across all epics before Friday’s gate.
Week 2 Deliverable by Role
Same week, different artifact per role. Architecture and Dev Tasks owners use the sequences in Part 7; reviewers produce notes in docs/reviews/.
| Role | Deliverable | Committed to |
|---|---|---|
| Solutions Architect | Architecture Document + one ADR per epic. Owns them. | docs/arch-docs/ (ARCH-... + ADR-...-E0NN) |
| Tech Lead | Dev Tasks — one CSV per epic, covering all PRD user stories. Owns them. | docs/dev-tasks/epic-N-...-tasks.csv |
| Solutions Designer | Architecture ↔ BRD consistency review — flag constraint violations as risks. | docs/reviews/week2-sd-review.md |
| Product Mgr / Designer | Architecture ↔ PRD alignment review — does UI/UX survive? | docs/reviews/week2-product-review.md |
| Project Manager | Handoff readiness check — sequencing, dependencies, blocking open decisions. | docs/reviews/week2-pm-review.md |
| Developers | Dev Tasks buildability review — enough to build each task? | docs/reviews/week2-dev-review.md |
| QA / DevOps | Architecture lens review — testability / deployability. | docs/reviews/week2-{qa|devops}-review.md |
Week 2 Traceability Check
Run this before the handoff, not at it — a broken link found at the gate is a slipped handoff; found earlier, it’s a quick fix.
- Every PRD user story (US-XX) is addressed by at least one Architecture component
- Every Architecture technology choice has a rationale inline or an ADR
- Every Dev Task references a PRD user story and an Architecture section
- Every PRD user story is covered by at least one Dev Task
- No Dev Task references a decision not yet made in Architecture or ADRs
Contribute back
Found an effective compacting pattern? Document it in knowledge/patterns/. Built a useful architecture or Dev Tasks agent? Commit it to knowledge/agents/{role}/. A prompt that reliably generates good Architecture sections from PRD input goes in knowledge/prompts/solutions-architect/.
Self-Check
- I can explain what the context window is and what fills it up.
- I know which files to load (and which not to) for my Week 2 work.
- I can identify the signs of context degradation and poisoning.
- I analyzed the poisoned-context transcript and can explain where it went wrong.
- I know when to continue, compact, or start fresh — and can apply the decision guide.
- I understand how subagents and their cross-tool equivalents provide context isolation.
- I can run the Architecture Document, ADR, and Dev Tasks checklists against a real artifact.
- I know the Week 2 traceability check (PRD → Architecture → Dev Tasks).
- I’ve identified my Week 2 deliverable and know what good looks like.