AI Foundational Training

Module 4 · Week 2 Mon

Context Engineering & Management

Prompting gets you started. Context engineering keeps you going.

Module Overview

Module 4 is about keeping AI output good across long sessions, not just the first message. Week 2 is where context pressure gets serious — loading a full BRD, PRD, and reference architecture into one session to produce the Architecture Document and Dev Tasks. You learn what to load and when, how to recognize and recover a degraded or poisoned session, and how to use subagents to keep the main context clean.

At a glance
Covers	Relevant-over-abundant context loading; continue vs. compact vs. start-fresh; context poisoning and recovery; subagents as context isolation; Architecture / ADR / Dev Tasks quality bars
When it runs	Week 2 Monday (in-person)
Builds on	Module 3 (prompting) and the approved Week 1 BRD + PRD
Leads into	Week 2 production of the Architecture Document, ADRs, and Dev Tasks

What you'll produce

Across Week 2: an approved Architecture Document, Architecture Decision Records (one per epic), and Dev Tasks (1–4-hour atomic tasks traceable to PRD user stories) — signed off at the Architecture Review + Dev Handoff gate.

The Discipline Behind the Prompt

A great prompt produces great output in message 1. Context engineering produces great output in message 50. Week 2 is where context pressure gets serious — Architects run multi-hour sessions loading BRD, PRD, reference architectures, and ADR drafts at once; Tech Leads generate Dev Tasks across the full PRD and Architecture Document. These are the heaviest context loads in the training.

Builds on Module 3

Module 3 taught you to write a strong prompt. Module 4 keeps that quality alive across a long session. If output degrades mid-session, the first question is context (this module); the second is prompt craft (Module 3). Keep both handbooks open during Week 2.

The cycle applied to the session itself

Evaluate: assess session health — is quality degrading? Is the window filling? Is history still relevant?

Plan: decide the strategy — continue, compact, or start fresh.

Apply: execute it — load the right context, compact and continue, or open a fresh session.

Validate: confirm output quality recovered. If not, reassess and apply again.

Part 1 — What Is Context

Every session runs inside a context window — fixed-size working memory holding your messages, the AI’s responses, loaded files, and the system prompt (including AGENTS.md). When it fills up, older content gets pushed out or compressed. The AI doesn’t announce this — it just starts producing worse output.

The context window is a whiteboard, not a hard drive

When the whiteboard fills, you erase the oldest notes to make room. The AI works with whatever’s still on the board. Your job: keep the most relevant information on the board at all times. Irrelevant context is as bad as missing context — it crowds out what matters.

What Fills the Window

Source	Notes
System prompt / project rules	AGENTS.md and tool-specific rules — loaded automatically, counts toward the window
Files you load	BRD, PRD, Architecture, code, ADR drafts — every file adds to the count
Conversation history	Every message and response — grows with each exchange
Tool outputs	Filesystem reads, GitHub searches, web results — each one adds context
The AI’s own reasoning	Chain-of-thought adds up fast in long sessions

Token Economics by Tool

Context is measured in tokens — roughly 0.75 words per token (1,000 tokens ≈ 750 words). Each model has a maximum window.

Tool	Approx. window	Good for	Watch out for
Claude Code (Opus 4.7 / Sonnet 4.6)	200K standard; 1M on Max / Team / Enterprise	Multi-file dev, long coding, agentic tasks	Auto-compacts near the limit — watch for quality drops before that point
OpenCode (model-dependent)	Set by the chosen model — e.g. Big Pickle 200K, or a 1M+ provider	Same dev/agentic work as Claude Code; pick the model to fit the job	Window is the model’s, not the tool’s — Big Pickle degrades well before 200K (~50–70K)
Gemini (Workspace, Gemini 2.5 Pro)	1M+ tokens	Long-context BRD/PRD/Architecture analysis, multi-doc synthesis	No auto-compact — manage manually; watch for drift in long threads
Cursor (GPT-4o / Sonnet 4.6)	128K–200K by model	Code-focused, file-grounded sessions	Smaller window — be deliberate about which files are open

Part 2 — Loading Context Deliberately

Loading is a design decision. Before any significant session, answer three questions: what does the AI need to know, what’s the minimum that covers it, and in what order should it load?

Relevant beats abundant

Load the specific sections, not the whole document. Working on auth architecture? Load the auth-related BRD requirements and PRD stories — not the full 50-page BRD.

Load in dependency order: upstream artifacts first (BRD before PRD, PRD before Architecture). The AI builds understanding layer by layer.

State what you loaded and why: “I’m loading X, Y, Z to work on [task].” Orients the AI and gives you a checkpoint.

Reload when sessions resume. The AI remembers nothing from yesterday.

What Each Role Loads in Week 2

Role	Load	Don’t over-load
Solutions Architect (Architecture)	Approved BRD + PRD first; add reference architectures and ADR drafts as needed	Don’t pre-load everything — load per section
Tech Lead (Dev Tasks)	Approved PRD + Architecture; for each story, only that story’s section	Not the full PRD at once — one story at a time
Solutions Designer (review)	BRD + PRD + early Architecture draft	Tight scope — checking BRD–Architecture consistency
Product Mgr / Designer (review)	PRD + Architecture draft	Checking UI/UX survives into architecture
Project Manager (coordination)	PRD + quality gates doc	No code or architecture detail needed
Developers (review)	Architecture + Dev Tasks draft	Checking buildability from the task alone
QA / DevOps (review)	PRD/Architecture with testability / deployability lens	Scoped to the lens, not the whole repo

Part 3 — The Session Lifecycle

Every session has a lifecycle: clean window → work → window fills → quality degrades. How you manage it decides whether you get good output for 2 hours or 20 minutes. Three strategies, one decision at each inflection point.

Strategy	What it does	Use when
Continue	Keep the current session going	Task is clearly related, context still relevant, quality still high, under ~40% of the window
Compact	Compress history, preserve key findings	One major sub-task done, starting another; session valuable but getting long
Start fresh	End the session; reload only what the next task needs	Corrected the AI 3+ times on the same mistake; output vague/contradictory; switching artifacts

Manual Compact Pattern (Gemini / OpenCode)

Claude Code compacts with /compact. In Gemini — and in OpenCode when you want a clean handoff between sessions — do it manually:

1. At the end of a section, ask the AI:
   "Summarize the decisions made so far and their rationale,
    plus any open questions. Max 10 bullets."
2. Copy that summary.
3. Open a new session.
4. Paste the summary as the first message.
5. Continue the work from there.

Decision Guide

Ask these three questions in order:

1. Is output quality still high?         No  → start fresh
2. Is the next task closely related?     No  → start fresh
3. Is the session history still relevant?
      Yes       → continue
      Partially → compact
 
When in doubt: start fresh. Cost = 5 minutes to reload.
Cost of pushing through a degraded session = a bad
artifact at Friday’s gate.

Part 4 — Context Poisoning

Poisoning is a specific failure: accumulated history contains contradictions or conflicting instructions the AI can’t resolve cleanly. The output is inconsistent, hedging, or quietly wrong — different from length-based degradation.

How It Happens

Cause	Example
Flip-flopping on decisions	“Use Redis for sessions” (msg 5), then “no sessions — use JWTs” (msg 15). Both are now in context.
Partial corrections	You correct the validation approach; the AI reverts next message because the original is still in context.
Cumulative scope drift	Starts as “generate Dev Tasks for US-05,” drifts to “redesign auth” by message 30.
Contradictory loaded files	A BRD that says “no sessions” loaded alongside a PRD draft that assumes session auth.

Signs of a Poisoned Context

The AI contradicts something it said 10 messages ago without acknowledging it
Output quality varies wildly between adjacent messages
The AI hedges: “depending on your earlier decision about X” when X was settled long ago
You feel like you’re arguing with the AI rather than collaborating
After a correction, the AI reverts to the wrong approach next message

Main exercise: the poisoned-context transcript

Below is a short, made-up session that starts useful and goes poisoned. Read it, then answer the questions underneath. The same pattern shows up in real Week 2 architecture sessions.

[1] USER: Designing session storage for the LMS API. ~1,000
    concurrent users, 8-hour sessions. What are my options?
 
[2] AI: Three options: (a) stateless JWTs, (b) server-side
    sessions in Redis, (c) DB-backed sessions. For your scale,
    stateless JWTs are simplest — no extra infra.
 
[3] USER: Good. Let's go stateless JWT, no server state.
 
[4] AI: Agreed. JWT with an 8-hour expiry, signed server-side,
    validated on each request. No session store needed.
 
[5] USER: Now add the ability to force-logout a user instantly
    when HR deactivates their account.
 
[6] AI: Sure — we'll keep a Redis set of revoked sessions and
    check it on every request, so we can revoke immediately.
 
[7] USER: Wait, I thought we were stateless? Now there's Redis?
 
[8] AI: You're right, let's stay stateless. We can use short
    5-minute JWTs and a refresh token stored in Redis to allow
    revocation. That keeps it stateless while supporting logout.

Diagnose it

Identify the exact message where the contradiction is injected (the point the design stops being internally consistent).

Name the contradiction: what two things is the AI now trying to satisfy at once?

Explain why the message [7] correction didn’t fix it — why does [8] still carry the conflict forward?

Write the recovery plan: what exactly would you load into a fresh session, and what would you state up front, to get a clean answer?

There is no in-session recovery

Notice that [8] doesn’t resolve the conflict — it tries to honor both “stateless” and “instant revocation via a store,” and ends up describing a stateful design while calling it stateless. Accumulated contradictions don’t un-contradict themselves. The only fix is a fresh session: load only the artifacts reflecting the current correct decision, state that decision explicitly at the start, and don’t carry the conflicting history forward.

Part 5 — Subagents as Context Isolation

Module 3 framed agents as a reusability layer. Module 4 adds a second dimension: an agent runs with its own context, separate from the main session. It gets only what its task needs, returns its output, and the main session’s context stays clean.

Why isolation matters

You’re 30 messages into an architecture session and want a quick code review. Paste the file into the main session and you add review noise to an architecture context — both tasks suffer. Instead, invoke the code-reviewer agent: it runs in isolation with only the file and AGENTS.md, returns findings, and your architecture session continues unaffected.

Tool	Isolation mechanism	Limitation
Claude Code	Subagents in .claude/agents/; auto-delegated by description; isolated window	Auto-delegation needs a clear description in frontmatter
OpenCode	Built-in subagents (General, Explore, Scout) plus custom ones in .opencode/agents/; auto-invoked by description or @-mentioned; each runs its own context	Auto-invocation needs a clear description; subagent inherits the parent’s model unless overridden
Gemini (Workspace)	Open a new chat session for the isolated task	Entirely manual; easy to forget to start fresh
Cursor	New Composer session or custom mode, scoped to the relevant files	Smaller window; file scoping is critical

When to isolate — and when not to

Isolate when: the sub-task is clearly scoped and unrelated; you want a second opinion not influenced by the session’s decisions; the sub-task is a different artifact type.

Don’t isolate when: the sub-task needs the main session’s decisions — instead, carry only the relevant decisions as a brief summary. The goal is clean, focused context, not fragmented context.

Part 6 — Week 2 Quality Bars

Module 2 brought the BRD and PRD bars. Module 4 brings this week’s three: Architecture Document, ADRs, and Dev Tasks. Same pattern — self-check first, peer review mid-week, gate review Friday.

Where the artifacts live — and where they don’t

Artifacts sit in a local docs/ folder, organized by type. They are NOT committed to the code repository — keep docs/ out of the repo (add it to .gitignore). The code repo holds code; the artifact set is delivery documentation that lives alongside it, not inside it.

ADRs and Dev Tasks are segregated by epic — not single monolithic files. Typical layout:

docs/brd/ — BRD-<project>-v1.0.md

docs/prd/ — PRD-<project>-v1.0.md

docs/arch-docs/ — ARCH-<project>-v1.0.md + one ADR per epic (ADR-<project>-E001…)

docs/dev-tasks/ — one CSV per epic (epic-0-...-tasks.csv, epic-1-...-tasks.csv)

docs/generated-figma/ — design output

Architecture Document

Translates the approved PRD into a technical design. Answers “how do we build this?”

Checklist

Every PRD user story traces to at least one architectural component
Every technology choice is justified inline or has an ADR
Non-functional requirements from the PRD are addressed explicitly, not assumed
The deployment topology is specific enough for DevOps to begin IaC
Security decisions are documented (auth, secrets management, access control)
No BRD constraint is violated (timeline, budget, data residency, tech lock-ins)
Open questions are explicitly listed, not hidden in narrative
The document is internally consistent — no section contradicts another

Common failure modes

Component names that map to no PRD user story (scope creep via architecture).

Technology choices with no rationale or ADR pointer.

NFRs in the PRD not addressed in the architecture (“1000 concurrent users” — how?).

Deployment section left vague (“will deploy to AWS” without specifics).

Internal contradictions (component stateless in one section, stateful in another).

Architecture Decision Records (ADRs, one per epic)

Document significant technical decisions — options, trade-offs, rationale. They make decisions explicable when the team or requirements change. ADRs are segregated by epic: one ADR file per epic under docs/arch-docs/ (e.g. ADR-<project>-E001-Authentication-Access-Control.md), sitting alongside the main Architecture Document.

Checklist

At least 2 options documented with concrete pros and cons — not strawmen
Rationale references specific BRD or PRD constraints that drove the decision
Consequences are honest — both positive and negative
The decision is traceable — the Architecture Document references the ADR
Status is set correctly (accepted, not still “proposed” if final)

Common failure modes

Only the chosen option documented — the ADR becomes post-hoc justification.

Rationale says “industry best practice” or “team preference” without specifics.

Consequences missing or only positive.

ADR exists but isn’t referenced from the Architecture Document.

Dev Tasks (CSV, segregated by epic)

Breaks the PRD and Architecture into developer-sized tasks — atomic, traceable, estimated in hours. This is what developers pick up in Week 3. Generated by the task-planner agent (https://ultimate-ai-agents.stratpoint.io/browser). Dev Tasks are segregated by epic: one CSV per epic under docs/dev-tasks/ (e.g. epic-0-project-setup-tasks.csv, epic-1-auth-access-control-tasks.csv). Each CSV opens with an Epic row (issue_type: Epic, estimate 0), then its tasks.

Checklist

Every task has a unique ID, PRD reference, and acceptance criteria
Every task is atomic and hour-estimated — oversized tasks are broken down further
Testing tasks (unit, integration) are developer-owned — role is Backend/Frontend/Fullstack, not QA
Tasks are segregated by epic — one CSV per epic, each opening with an Epic row
Every PRD user story is covered by at least one task (no orphans)
Dependencies are explicit — a developer can self-sequence from the document alone
No task requires a decision not already made in Architecture or ADRs
Acceptance criteria are testable — done/not-done is unambiguous
Ordered logically — foundation before features, auth before protected routes

Common failure modes

Oversized, non-atomic tasks (“implement the full auth system” — should be several tasks).

Missing PRD reference — can’t trace back to a user story.

Acceptance criteria copied verbatim from the PRD without implementation refinement.

Implicit dependencies (PATCH before the GET it depends on).

References to decisions not yet made (“use the caching strategy TBD in ADR-03”).

Part 7 — Producing Your Week 2 Artifact

The Evaluate → Plan → Apply → Validate cycle, with context discipline baked in. Sequences below are for the Architecture Document (Solutions Architect) and Dev Tasks (Tech Lead). Reviewers produce notes — see the role table.

Using the agents — set up a Project or Gem (no CLI needed)

On web chatbots you don’t invoke an agent from the browser — you create a persistent container that holds its system prompt, then converse with that:

• Claude.ai: create a Project, paste the agent into the Project’s custom instructions (source: agents/claude-projects/).

• Gemini: Gems → Create, paste the agent into the System Instruction field (source: agents/gemini/).

• ChatGPT: create a Custom GPT, paste into its Instructions (source: agents/chatgpt/).

The agent browser (index.html) is just the copy source — browse, pick your platform, copy. The Project / Gem / Custom GPT is what actually runs the prompt, and it persists, so you set it up once and reuse it all week.

CLI (Claude Code): install once, then invoke by name with @task-planner or @solution-architect. Same agent, same output — only the mechanism differs.

Lean on Module 3

These are still prompts. When a section comes back thin or drifts, apply the Module 3 craft — structure, chain of thought for decisions, least-to-most for long documents — before assuming the inputs are at fault.

How to read these sequences

The colored boxes are prompt text — the words you send the AI. The plain text above each sequence (“Before you start”) is setup you do yourself: creating the Project/Gem and loading files. Do the setup first, then send the boxes one at a time, reading the output before sending the next. Don’t paste the setup steps into the chat.

Architecture Document + ADRs (Solutions Architect)

Before you start (setup — do this yourself, don’t paste it)

1. Set up the agent. Load solution-architect as your system prompt — a Claude.ai Project or Gemini Gem with the agent pasted in (web), or @solution-architect (Claude Code CLI). See the setup callout above.

2. Load context (Gemini handles the long context well). Approved BRD + PRD, plus the reference Architecture Document in docs/arch-docs/ as your quality bar.

3. Then send the prompts below, one at a time.

Send — Evaluate

[EVALUATE]

Before drafting anything, analyze the loaded BRD and PRD

and list:

1. Every PRD user story that needs an architectural component

2. The epics, and the significant decision each one will

need captured as an ADR

3. The NFRs the architecture must satisfy

4. Any BRD constraints the architecture must not violate

Don’t write the document yet — just the analysis.

Read the analysis. If it missed an epic or NFR, your context was thin — fix the load before continuing.

Send — Plan

[PLAN]

Plan the Architecture Document structure and the ADR list

before writing. Confirm:

- Every PRD story will map to a component

- Each epic gets one ADR with ≥2 real options

- NFRs get explicit treatment, not assumptions

Don’t write the document yet — plan only.

Review the plan against the spec. If the session is getting long, compact before drafting: ask for a summary of decisions so far, then start a fresh session with that summary.

Send — Apply

[APPLY]

Plan approved. Draft the Architecture Document section by

section. For each epic, also draft one ADR with: title,

status, context, ≥2 options with pros/cons, decision,

rationale (cite the BRD/PRD), and honest consequences.

Name the ADRs per epic: ADR-<project>-E001, E002, …

Work one section per turn — don’t dump the whole document

at once.

Save the document and ADRs into docs/arch-docs/. If you need an unrelated sub-task mid-session (e.g. a quick code review), run it in a separate session / subagent so this one’s context stays clean.

Send — Validate

[VALIDATE]

Check this draft against the Architecture Document and ADR

checklists. For each item, answer PASS or FAIL with the

specific gap. Then run the traceability check: every PRD

story → a component, every tech choice → a rationale or ADR.

List every FAIL.

Fix every FAIL yourself (or with follow-up prompts) before declaring ready for peer review.

Dev Tasks (Tech Lead)

Before you start (setup — do this yourself, don’t paste it)

1. Set up the agent. Load task-planner as your system prompt — a Claude.ai Project or Gemini Gem with the agent pasted in (web), or @task-planner (Claude Code CLI).

2. Load context — one epic at a time. Only that epic’s PRD stories and the relevant Architecture/ADR section, plus an existing per-epic CSV as the schema reference. This is the key context move: don’t load the full PRD at once.

3. Then send the prompts below, one at a time.

Send — Evaluate

[EVALUATE]

For the loaded epic, before breaking anything down, list:

1. The user stories in this epic that need tasks

2. The Architecture components / ADRs each will rely on

3. Any decision still open that would block a task

Don’t generate tasks yet — just the analysis.

If step 3 surfaces an open decision, resolve it (or note it) before generating tasks — an unresolved decision becomes a blocked task in Week 3.

Send — Plan

[PLAN]

Plan this epic’s task breakdown before generating the CSV:

- An Epic row (issue_type: Epic), then the atomic tasks

- The dependencies between them

- Which Architecture component / ADR each task relies on

Confirm no task needs a decision not yet made.

Review the breakdown. Are the tasks atomic? Are dependencies in a buildable order? Push back here, before the CSV is generated.

Send — Apply

[APPLY]

Plan approved. Generate this epic’s Dev Tasks as CSV rows

matching the schema: task_id, issue_type, summary,

user_story, description, acceptance_criteria, priority,

estimate, story_points, role, assignee, status, labels,

epic_link, dependencies.

Lead with the Epic row, then its tasks.

Save as docs/dev-tasks/epic-N-<slug>-tasks.csv. Then start a fresh session for the next epic — don’t accumulate every epic in one context.

Send — Validate

[VALIDATE]

Check every generated row against the Dev Tasks checklist:

- Each task atomic + hour-estimated, with a unique ID and

PRD + Architecture references?

- Every user story in this epic covered? Dependencies

explicit?

- Any task referencing a decision not yet made?

Then trace: every task → user story → BRD requirement.

List every FAIL.

Fix every FAIL before peer review. Run this same check across all epics before Friday’s gate.

Week 2 Deliverable by Role

Same week, different artifact per role. Architecture and Dev Tasks owners use the sequences in Part 7; reviewers produce notes in docs/reviews/.

Role	Deliverable	Committed to
Solutions Architect	Architecture Document + one ADR per epic. Owns them.	docs/arch-docs/ (ARCH-... + ADR-...-E0NN)
Tech Lead	Dev Tasks — one CSV per epic, covering all PRD user stories. Owns them.	docs/dev-tasks/epic-N-...-tasks.csv
Solutions Designer	Architecture ↔ BRD consistency review — flag constraint violations as risks.	docs/reviews/week2-sd-review.md
Product Mgr / Designer	Architecture ↔ PRD alignment review — does UI/UX survive?	docs/reviews/week2-product-review.md
Project Manager	Handoff readiness check — sequencing, dependencies, blocking open decisions.	docs/reviews/week2-pm-review.md
Developers	Dev Tasks buildability review — enough to build each task?	docs/reviews/week2-dev-review.md
QA / DevOps	Architecture lens review — testability / deployability.	docs/reviews/week2-{qa\|devops}-review.md

Week 2 Traceability Check

Run this before the handoff, not at it — a broken link found at the gate is a slipped handoff; found earlier, it’s a quick fix.

Every PRD user story (US-XX) is addressed by at least one Architecture component
Every Architecture technology choice has a rationale inline or an ADR
Every Dev Task references a PRD user story and an Architecture section
Every PRD user story is covered by at least one Dev Task
No Dev Task references a decision not yet made in Architecture or ADRs

Contribute back

Found an effective compacting pattern? Document it in knowledge/patterns/. Built a useful architecture or Dev Tasks agent? Commit it to knowledge/agents/{role}/. A prompt that reliably generates good Architecture sections from PRD input goes in knowledge/prompts/solutions-architect/.

Self-Check

I can explain what the context window is and what fills it up.
I know which files to load (and which not to) for my Week 2 work.
I can identify the signs of context degradation and poisoning.
I analyzed the poisoned-context transcript and can explain where it went wrong.
I know when to continue, compact, or start fresh — and can apply the decision guide.
I understand how subagents and their cross-tool equivalents provide context isolation.
I can run the Architecture Document, ADR, and Dev Tasks checklists against a real artifact.
I know the Week 2 traceability check (PRD → Architecture → Dev Tasks).
I’ve identified my Week 2 deliverable and know what good looks like.