123 lines
9.2 KiB
Markdown
123 lines
9.2 KiB
Markdown
# Experience Memory (Mistake / Lesson Loop) — Implementation Plan
|
||
|
||
> Goal: **Astra extracts reusable lessons from work results and QA feedback, and automatically
|
||
> turns them into a pre-task checklist so it stops falling into the same hole.**
|
||
>
|
||
> Not "make the model perfect" — "put a railing where the model keeps falling."
|
||
|
||
## Design principles (why this isn't just "more logging")
|
||
|
||
1. **Closed loop**: record → extract lesson → inject before next similar task → preflight checklist → (light) post-check.
|
||
Plain recording does not change future behavior; an *injected* lesson does.
|
||
2. **Reuse existing infrastructure — do NOT build a parallel system.** Astra already has:
|
||
- 5-layer memory (`src/memory/`): ProceduralMemory ("how to do X" — recipes), EpisodicMemory ("conversation flow"),
|
||
LongTermMemory ("user rules/decisions"), ProjectMemory, ShortTermMemory; `MemoryExtractor.onSessionEnd`.
|
||
- ProjectChronicle (`src/features/projectChronicle`): planning/development/bugs/retrospectives auto-records.
|
||
- `RetrievalOrchestrator` + `selectWithinBudget` (RAG), SecondBrainTrace.
|
||
- `src/retrieval/brainIndex.ts` — mtime-keyed token cache of **every `.md` in the active brain**.
|
||
⇒ Lessons are just markdown files **inside the active brain**, identified by `lessons/`-style path or
|
||
frontmatter `type: lesson|playbook|qa-finding`. They become retrievable *for free* via `searchBrainFiles`.
|
||
Distinction to keep clear: **ProceduralMemory = recipe ("how"), Lesson = guardrail ("what went wrong & how not to repeat")**.
|
||
3. **The risky half is *production* of lessons, not consumption.** Low-signal auto-records pollute retrieval.
|
||
So lesson *generation* is heavily gated; lesson *consumption* (retrieval + injection) is the cheap, safe half — build it first.
|
||
4. **Everything goes through the token budget.** Lessons compete with brain knowledge inside the same
|
||
~3200-token RAG allocation (and the small-model context cap). They get a modest score boost + a small
|
||
reserved sub-slot, **frequency-weighted, not recency-weighted** (a mistake violated 5× is louder than a one-off).
|
||
5. **Inspectable & correctable.** A bad auto-lesson that keeps getting injected = a poison source. The user must
|
||
see which lessons fed an answer (the per-answer scope footer) and be able to edit/delete/ignore one trivially.
|
||
|
||
## Data model
|
||
|
||
Lesson card = a markdown file in the active brain (convention: under `lessons/`, `playbooks/`, or `qa-findings/`,
|
||
or any file with the frontmatter below). The brain index (`brainIndex.ts`, version ≥ 3) stores a `kind` per file
|
||
parsed from path + frontmatter, so retrieval can tell lessons apart without re-reading content.
|
||
|
||
```md
|
||
---
|
||
type: lesson # lesson | playbook | qa-finding
|
||
title: Telegram remote execution must require allowlist
|
||
applies-to: [telegram, remote-execution, security, approval-flow]
|
||
project: ConnectAI # optional — scopes the lesson to one project
|
||
severity: high # low | medium | high
|
||
source: curated # curated | auto (curated weighted higher in retrieval)
|
||
occurrences: 1 # bumped on dedup-merge instead of creating a duplicate
|
||
last-seen: 2026-05-12
|
||
---
|
||
|
||
## Situation
|
||
…
|
||
|
||
## Mistake / Risk
|
||
…
|
||
|
||
## Root Cause
|
||
…
|
||
|
||
## Fix
|
||
…
|
||
|
||
## Prevention Checklist
|
||
- …
|
||
- …
|
||
|
||
## Applies To
|
||
- telegram
|
||
- remote-execution
|
||
- …
|
||
```
|
||
|
||
## Pipeline (target end state)
|
||
|
||
| Stage | What | Trigger | Risk control |
|
||
|---|---|---|---|
|
||
| **Retrieve** | On a new request, retrieve relevant lessons (same RAG pipeline; `applies-to` tags + content matched, modest boost) | every turn | budget-bounded; capped to top-K |
|
||
| **Preflight** | Inject `[ACTIVE LESSONS — verify before finalizing]` block (Prevention Checklists) into the *protected* part of the system prompt | every turn with ≥1 lesson | placed before `[CONTEXT]` so it survives truncation |
|
||
| **Collect** | Capture changed files, run commands, failed logs, test red→green, rollbacks, approval rejections, user QA feedback as events | during the turn | already mostly in TransactionManager / approval queue |
|
||
| **Generate** | Build a *draft* lesson card from a strong-trigger turn | ① explicit QA feedback ② test red→green ③ `transactionManager.rollback()` ④ approval rejection ⑤ user says "기록해" — **never on a plain success** | quality bar (must have concrete Root Cause + checklist) or discard; **dedup → merge & bump `occurrences`**; **human confirms before persist (MVP)**; `source: auto`, lower weight |
|
||
| **Post-check (non-blocking)** | After the answer, flag in the footer any Prevention-Checklist item not visibly addressed | every turn with lessons | **never re-prompt / never block** — just a footer warning. (Blocking gate only for "risky ops" via the existing dryRun approval flow, v2+) |
|
||
|
||
Hard caps: ≤ N lessons injected per turn; ≤ M total `source: auto` lessons before requiring cleanup
|
||
(mirrors `brainIndex` 12k cap). Cross-project bleed prevented by `project:` tag + ProjectMemory-style scoping.
|
||
|
||
## MVP — build the *consumption* side first
|
||
|
||
1. **Lesson detection in the brain index** — `brainIndex.ts` parses `kind` from path/frontmatter; stored, version-bumped.
|
||
2. **Lesson-aware retrieval** — `searchBrainFiles` marks lesson chunks (`metadata.isLesson`), uses a larger excerpt
|
||
(whole card if short), gives a modest score boost. `RetrievalOrchestrator.retrieve()` splits lesson chunks out
|
||
into `result.lessonChunks`.
|
||
3. **Preflight injection** — `agent.buildMemoryContext` prepends an `[ACTIVE LESSONS — verify before finalizing]`
|
||
block (built from lesson cards / their Prevention-Checklist sections) ahead of the normal RAG context, inside
|
||
the truncation-protected zone. Non-lesson RAG context unchanged.
|
||
4. **Visibility** — the per-answer "참조 범위" footer (already shipped) shows `· 교훈 N개` and hover lists the files;
|
||
clicking opens the agent↔knowledge mapping editor (or, later, the lesson files).
|
||
5. **Manual seeding** — `g1nation.lesson.create` command: asks for a title, writes `<brain>/lessons/<slug>.md`
|
||
from the card template, opens it. This is also the seed for the future auto-generator.
|
||
|
||
> With (1)–(5) you can hand-write 3–5 high-value lessons today and immediately verify "the next similar task behaves
|
||
> differently", before any auto-generation risk exists.
|
||
|
||
## v2 — status
|
||
|
||
Done:
|
||
- **Triggers → "기록할까요?" prompt** (human-confirm UI, never auto-saves): `transactionManager.rollback()` (action failure), `rejectTransaction()` (user rejected a dry-run change), and QA-feedback (user message matches `isQaRegressionFeedback` — "또 안 돼", "비슷한 실수", "왜 반복돼", "고쳤는데 깨졌", "regression", "why … keep failing", …). All post a `lessonCandidate` webview message → sidebar shows a dismissible box → "📝 교훈 기록" runs `g1nation.lesson.fromConversation` (pre-filled Situation).
|
||
- **`occurrences` dedup-merge**: `createLessonCard` checks existing lessons by normalized title; on match it offers "갱신 (occurrences +1)" which runs `bumpLessonOccurrences` (increments `occurrences:`, sets `last-seen:`) instead of spawning a duplicate. Recurring mistake → louder, not more numerous.
|
||
- **Non-blocking post-QA flag**: `findUnaddressedChecklistItems(answer, lessonCards)` — lesson Prevention-Checklist items whose significant terms don't appear in the answer are listed in the per-answer footer (`⚠ 답변에서 안 보이는 교훈 체크리스트 항목: …`). No re-prompt, no block.
|
||
- **Manage / delete / ignore UI**: `g1nation.lesson.manage` QuickPick (lists all lesson cards from the brain via the index; open on select; trash button → confirm + delete = no longer injected). The footer's `⚠ 교훈 N` is clickable → opens this picker.
|
||
|
||
Not done (next):
|
||
- Auto-retrospective draft on a *successful* turn (heavily gated, `source: auto`, lower retrieval weight). — deliberately last; high noise risk.
|
||
- test red→green trigger — **blocked**: `<run_command>` actions run in a VS Code terminal (`terminal.sendText`) with no output capture, so Astra can't observe test results. Needs a captured-output execution path first.
|
||
- `source: auto` vs `curated` retrieval-weight distinction; archive one-off lessons unused > 6 months.
|
||
- Reserved lesson sub-budget within the RAG allocation; frequency-weighted (`occurrences`) ordering of injected lessons.
|
||
|
||
## Integration points (files)
|
||
|
||
- `src/retrieval/lessonHelpers.ts` *(new, pure)* — `LESSON_DIR_RE`, `detectLessonKind(relativePath, content)`, `buildLessonChecklistBlock(chunks)`, `lessonTemplate(title)`.
|
||
- `src/retrieval/brainIndex.ts` — `IndexEntry.kind`, version bump, populate via `detectLessonKind`; expose `kind` on `IndexedBrainDoc`.
|
||
- `src/retrieval/types.ts` — `RetrievalChunk.metadata.isLesson?`, `RetrievalResult.lessonChunks?`.
|
||
- `src/retrieval/index.ts` — `searchBrainFiles` lesson handling; `retrieve()` splits lesson chunks.
|
||
- `src/agent.ts` — `buildMemoryContext` prepends the lessons block; `_lastRetrievalInfo.lessonFiles`; `usedScope` message carries it.
|
||
- `media/sidebar.js` — scope footer shows `· 교훈 N개`.
|
||
- `package.json` + `src/extension.ts` — `g1nation.lesson.create` command.
|
||
- *(v2)* `src/features/projectChronicle`, `TransactionManager`, approval queue — trigger hooks; `MemoryManager`/`ProceduralMemory` — optional structured layer.
|