chore: v2.2.73 — ASTRA-DEBUG 로그 레벨 + webview CSP font-src 보강

- ASTRA-DEBUG 정상 흐름 로그를 console.error → logInfo/console.log 로 강등 (chatHandlers, extension, slashRouter): DevTools에 ERR로 찍히던 오탐 제거 - sidebar webview에 명시적 CSP meta 추가 + font-src에 data: 허용 (sidebar.html, sidebarProvider._getHtml): VS Code outer iframe이 codicon.ttf를 data:font/ttf 로 inject하면서 기본 CSP에 막혀 매 prompt 마다 violation 경고가 찍히던 문제 해소 - 누적된 LM Studio / agent / 컨텍스트 매니저 / 테스트 갱신 동반 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 15:52:19 +09:00
parent 36db170844
commit 0712014fcb
43 changed files with 2417 additions and 977 deletions
@@ -3,20 +3,20 @@
 <!-- ASTRA:AUTO-START -->

 ## Snapshot
- **Workspace**: `connectai` `v2.2.63` _(absolute path varies by environment; resolved from the active VS Code workspace)_
+- **Workspace**: `ConnectAI` `v2.2.73` _(absolute path varies by environment; resolved from the active VS Code workspace)_
 - **Description**: The personal intelligence layer for Antigravity and VS Code. A private cognitive partner for deep project context, memory, and proactive strategic decision-making.
 - **Stack**: TypeScript, Node.js, VS Code Extension, LM Studio SDK, Test runner
- **Stats**: 276 source files, ~55,424 lines across 5 top-level modules.
+- **Stats**: 285 source files, ~56,679 lines across 5 top-level modules.

 ## Last Refresh
- **Time**: 2026-05-22T10:04:22.779Z
+- **Time**: 2026-05-23T06:46:38.895Z
 - **Files newly analysed**: 5
- **Files reused from cache**: 271
+- **Files reused from cache**: 280

 ## Directory Map
 ```mermaid
 mindmap
-  root((connectai))
+  root((ConnectAI))
    src/
      features/
      core/
@@ -31,7 +31,6 @@ mindmap
    docs/
      records/
      docs/
-      Meeting/
 ```

 ## Module Dependencies
@@ -42,7 +41,7 @@ flowchart LR
    media["media/<br/>6 files"]
    tests["tests/<br/>35 files"]
    core_py["core_py/<br/>6 files"]
-    docs["docs/<br/>90 files"]
+    docs["docs/<br/>99 files"]
    tests --> src
 ```

@@ -65,14 +64,14 @@ flowchart LR

 ## Modules

-### `src/` — 139 files, ~38,303 lines
+### `src/` — 139 files, ~39,180 lines

 **Sub-directories**
 - `src/features/` (66) — Astra Office — public API. 다음 세션에서 추가될 OfficeSnapshot presenter / schema 도 같은 entry 로 노출 예정. 현재 노출: full webview panel H
 - `src/core/` (15) — Astra Path Resolver (경로 해결기) Astra의 모든 데이터 파일(.astra 디렉토리)의 경로를 중앙에서 관리합니다. 확장 프로그램의 설치 경로(extensionUri) 기반으로 .astra 디렉토
 - `src/memory/` (8) — Episodic Memory (일화 기억) 과거 대화/회의/결정의 맥락 흐름을 저장합니다. 세션 종료 시 자동으로 에피소드를 요약하여 저장합니다. "왜 이렇게 결정했는지", "어떤 흐름으로 진행했는지" 기록. 저장 
 - `src/retrieval/` (8) — Brain Index — persistent, mtime-keyed tokenized cache of the Second Brain RAG 검색은 매 질의마다 브레인의 모든 .md 파일을 읽고 토크나이즈해서 TF-I
- `src/docs/` (6) — Bug: Edited agent.ts Edited agent.ts Edited agent.ts Edited agent.ts Edited agent.ts ...
+- `src/docs/` (6) — src Chronicle Records
 - `src/lib/` (6) — Context Manager (컨텍스트 한계 관리) "context length = 132k" 는 "답변을 132k 토큰까지 생성해도 된다" 가 아닙니다. 시스템 프롬프트 + 대화 기록 + 입력 문서 + 생성될 답변
 - `src/integrations/` (4) — Per-chat conversation history for the Telegram bot. Why this exists: the previous bot was stateless — every inbound mess
 - `src/lmstudio/` (4) — 4 files (.ts)
@@ -82,22 +81,22 @@ flowchart LR
 - `src/scaffolder/` (2) — Scaffolder template catalog. Templates are pure data — (projectName) => { [relativePath]: contents }. New templates are 

 **Key files**
- `src/utils.ts` (408 lines)
- `src/config.ts` (301 lines)
+- `src/utils.ts` (448 lines)
+- `src/config.ts` (394 lines)
 - `src/features/company/types.ts` (446 lines) — Type definitions for the 1인 기업 (One-Person Company) mode. The mode turns the user into a virtual CEO that dispatches work to a roster of specialist agents. Each turn produces a session directory conta
 - `src/core/services.ts` (164 lines)
 - `src/lib/paths.ts` (151 lines)
 - `src/features/company/companyConfig.ts` (896 lines) — State + config plumbing for 1인 기업 모드. Two surfaces: - CompanyState (runtime data: enabled flag, company name, which agents are active, per-agent model overrides). Persisted in VS Code's globalState so
- `src/sidebarProvider.ts` (4226 lines)
+- `src/sidebarProvider.ts` (4327 lines)
 - `src/memory/types.ts` (126 lines) — Memory Type Definitions (메모리 타입 정의) Astra의 5-Layer Cognitive Memory System의 모든 타입을 정의합니다. ① Short-Term ② Long-Term ③ Project ④ Procedural ⑤ Episodic
 - `src/retrieval/scoring.ts` (536 lines) — Scoring Engine — TF-IDF + Bilingual Tokenizer 단순 includes() 키워드 매칭을 넘어서, TF-IDF 가중치 기반의 문서 스코어링을 제공합니다. 한국어/영어 양국어 토크나이저를 포함합니다.
 - `src/skills/agentKnowledgeMap.ts` (374 lines)
 - `src/retrieval/lessonHelpers.ts` (325 lines) — Lesson / Experience Memory — pure helpers (no vscode dependency) "Lesson" = a markdown file in the active brain that captures a past mistake/risk and how to avoid repeating it. Identified by a lessons
- `src/agent.ts` (3823 lines)
+- `src/agent.ts` (4076 lines)
 - `src/features/providers/types.ts` (63 lines) — Cloud LLM provider routing — model id prefix → provider id 매핑. Prefix 규칙: openrouter:anthropic/claude-3.5-sonnet → { provider: 'openrouter', model: 'anthropic/claude-3.5-sonnet' } anthropic:claude-3-5
- `src/lib/engine.ts` (906 lines)
+- `src/lib/engine.ts` (940 lines)
 - `src/retrieval/brainIndex.ts` (325 lines) — Brain Index — persistent, mtime-keyed tokenized cache of the Second Brain RAG 검색은 매 질의마다 브레인의 모든 .md 파일을 읽고 토크나이즈해서 TF-IDF 점수를 계산했습니다 — 파일 수가 많아지면 그게 병목입니다. 이 모듈은 <brainPath>/.astra/brain-index.json 에
- `src/features/company/dispatcher.ts` (1437 lines) — Sequential dispatcher for 1인 기업 모드. Drives one company "turn": user prompt → CEO planner (JSON {brief, tasks}) → for each task in plan: dispatch one specialist (sequentially) - build specialist prompt
+- `src/features/company/dispatcher.ts` (1435 lines) — Sequential dispatcher for 1인 기업 모드. Drives one company "turn": user prompt → CEO planner (JSON {brief, tasks}) → for each task in plan: dispatch one specialist (sequentially) - build specialist prompt
 - `src/features/providers/providerConfig.ts` (78 lines) — Provider 별 API key + enable 토글 저장소. 설계: - API key 자체는 vscode.SecretStorage (secrets) 에 — settings.json / Settings Sync 침범 안 받음. - enabled 토글은 일반 settings (g1nation.providers.<id>.enabled) — 사용자가 패널에서 
 - `src/features/approval/approvalQueue.ts` (129 lines)
 - `src/integrations/telegram/telegramClient.ts` (154 lines)
@@ -106,19 +105,19 @@ flowchart LR
 - `src/features/company/pixelOfficeState.ts` (286 lines) — Pixel Office — Agent Work Pipeline 상태를 시각화하는 UI Layer 전용 모듈. ─────────────────── 설계 원칙 ─────────────────── 1. Agent 핵심 판단 로직을 절대 바꾸지 않는다. Pipeline 진행, contract 합의, 검수 cycle, 승인 게이트 — 모두 기존 dispatcher 
 - `src/features/company/sessionStore.ts` (231 lines) — Disk persistence for company-mode session artefacts. Each company turn produces a timestamped directory: <workspaceRoot>/.astra/company/sessions/2026-05-13T21-29/ ├─ brief.md ← CEO's task decompositio
 - `src/features/projectArchitecture/scanner.ts` (644 lines) — Deep static analyser for the Project Architecture Context generator. Walks the project tree (skipping the usual nodemodules / out / dist noise), pulls the role of each interesting file from its leadin
- `src/lib/contextManager.ts` (275 lines) — Context Manager (컨텍스트 한계 관리) "context length = 132k" 는 "답변을 132k 토큰까지 생성해도 된다" 가 아닙니다. 시스템 프롬프트 + 대화 기록 + 입력 문서 + 생성될 답변 + 여유분 ≤ context length 이 모듈은 요청을 보내기 전에 입력 토큰을 추정하고, - 동적으로 출력 상한(maxTokens)을 계
+- `src/lib/contextManager.ts` (278 lines) — Context Manager (컨텍스트 한계 관리) "context length = 132k" 는 "답변을 132k 토큰까지 생성해도 된다" 가 아닙니다. 시스템 프롬프트 + 대화 기록 + 입력 문서 + 생성될 답변 + 여유분 ≤ context length 이 모듈은 요청을 보내기 전에 입력 토큰을 추정하고, - 동적으로 출력 상한(maxTokens)을 계

-### `media/` — 6 files, ~7,342 lines
+### `media/` — 6 files, ~7,455 lines

 **Key files**
- `media/sidebar.css` (2078 lines) — Stylesheet
- `media/sidebar.js` (3677 lines)
- `media/sidebar.html` (545 lines) — Astra
+- `media/sidebar.css` (2068 lines) — Stylesheet
+- `media/sidebar.js` (3807 lines)
+- `media/sidebar.html` (538 lines) — Astra
 - `media/settings-panel.html` (381 lines) — Astra Settings
 - `media/settings-panel.css` (210 lines) — Stylesheet
 - `media/settings-panel.js` (451 lines)

-### `tests/` — 35 files, ~5,969 lines
+### `tests/` — 35 files, ~6,004 lines
 *Depends on*: `src/`

 **Sub-directories**
@@ -126,10 +125,10 @@ flowchart LR

 **Key files**
 - `tests/agentEngine.test.ts` (782 lines) — AgentEngine Integration Tests & Performance Benchmarks 검증 대상: 1. ErrorClassifier — 오류 유형(Transient/Permanent/Abort) 자동 분류 2. ErrorRecoveryMatrix — 각 규칙이 의도한 대응 전략으로 매핑되는지 검증 3. resilientExecute — 지수 백
- `tests/lmStudioLifecycle.test.ts` (318 lines) — Unit tests for ModelLifecycleManager. Strategy: inject mock ILMStudioClient and a simple in-memory IActivityTracker. No real LM Studio or SDK is touched — the manager file does not import the SDK dire
+- `tests/lmStudioLifecycle.test.ts` (326 lines) — Unit tests for ModelLifecycleManager. Strategy: inject mock ILMStudioClient and a simple in-memory IActivityTracker. No real LM Studio or SDK is touched — the manager file does not import the SDK dire
 - `tests/telegramBot.test.ts` (363 lines) — Unit tests for TelegramBot + truncateForTelegram. Strategy: - TelegramBot is driven by an injected ITelegramClient stub. We script getUpdates to return queued batches and assert that: - the offset cur
- `tests/lmStudioStreamer.test.ts` (220 lines) — Unit tests for LMStudioStreamer. Strategy: inject a fake ILMStudioClient that returns a fake model handle whose respond() yields a controllable async iterable. No real SDK or WebSocket touched.
- `tests/localPathPreflight.test.ts` (490 lines)
+- `tests/lmStudioStreamer.test.ts` (222 lines) — Unit tests for LMStudioStreamer. Strategy: inject a fake ILMStudioClient that returns a fake model handle whose respond() yields a controllable async iterable. No real SDK or WebSocket touched.
+- `tests/localPathPreflight.test.ts` (492 lines)
 - `tests/secondBrainTrace.test.ts` (407 lines)
 - `tests/approvalQueue.test.ts` (164 lines) — Unit tests for ApprovalQueue. Strategy: drive enqueue → approve / reject / clear / pre-empt directly, confirm the onChange event fires at the right moments and callbacks fire exactly once.
 - `tests/projectScaffolder.test.ts` (135 lines) — Unit tests for FileSystemProjectScaffolder. Drives against a real temp directory so end-to-end file IO + path-traversal defenses are exercised.
@@ -144,7 +143,7 @@ flowchart LR
 - `tests/vulnerability.test.ts` (60 lines) — / <reference types="jest" />
 - `tests/brainIndex.test.ts` (107 lines)
 - `tests/calendarApi.test.ts` (131 lines)
- `tests/contextManager.test.ts` (129 lines)
+- `tests/contextManager.test.ts` (149 lines)
 - `tests/icsParser.test.ts` (134 lines)
 - `tests/lessonHelpers.test.ts` (191 lines)
 - `tests/projectChronicle.test.ts` (199 lines)
@@ -161,39 +160,38 @@ flowchart LR
 - `core_py/optimizer.py` (55 lines)
 - `core_py/queue_worker.py` (82 lines)

-### `docs/` — 90 files, ~3,401 lines
+### `docs/` — 99 files, ~3,631 lines

 **Sub-directories**
- `docs/records/` (77) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 프로젝트 코드 리뷰 해줄 수 있어? 개선할 부분이 있는지, 그러고...
- `docs/docs/` (5) — Bug: Viewed integrationretrieval.test.ts:1-59 integrationretrieval.test.ts를 통해 ...
- `docs/Meeting/` (0)
+- `docs/records/` (86) — Astra Project Chronicle Records
+- `docs/docs/` (5) — docs Chronicle Records

 **Key files**
 - `docs/TELEGRAM_REMOTE_EXECUTION_PLAN.md` (452 lines) — Telegram Remote Execution 기획서
 - `docs/AgentEngine_Architecture.md` (314 lines) — AgentEngine Architecture Document
+- `docs/records/ConnectAI/timeline.md` (209 lines) — Project Timeline
 - `docs/ASTRA_OFFICE_REFACTOR.md` (198 lines) — Astra Office Refactor — Design Doc
 - `docs/EXPERIENCE_MEMORY_PLAN.md` (122 lines) — Experience Memory (Mistake / Lesson Loop) — Implementation Plan
 - `docs/records/ConnectAI/development/2026-05-02_connectai_project_knowledge_overview.md` (121 lines) — Astra Project Knowledge Overview
 - `docs/records/ConnectAI/development/2026-05-03_connectai_project_knowledge_overview.md` (121 lines) — Astra Project Knowledge Overview
- `docs/records/ConnectAI/timeline.md` (182 lines) — Project Timeline
 - `docs/Advanced_Features_Implementation_Guide.md` (40 lines) — Advanced Features Implementation Guide
+- `docs/PROJECT_CHRONICLE_GUARD_ROADMAP.md` (43 lines) — Project Chronicle Guard: Search Engine Roadmap
+- `docs/UX_UI_Consistency_Guidelines.md` (44 lines) — UX/UI Consistency Guidelines
+- `docs/docs/records/docs/README.md` (18 lines) — docs Chronicle Records
 - `docs/docs/records/docs/bugs/BUG-0001-viewed-integration-retrieval-test-ts-1-59-integration-retrie.md` (16 lines) — Bug: Viewed integrationretrieval.test.ts:1-59 integrationretrieval.test.ts를 통해 ...
 - `docs/docs/records/docs/chronicle.config.json` (11 lines) — JSON configuration
 - `docs/docs/records/docs/project-profile.md` (31 lines) — Project Profile
- `docs/docs/records/docs/README.md` (18 lines) — docs Chronicle Records
 - `docs/docs/records/docs/timeline.md` (7 lines) — Project Timeline
- `docs/PROJECT_CHRONICLE_GUARD_ROADMAP.md` (43 lines) — Project Chronicle Guard: Search Engine Roadmap
- `docs/records/ConnectAI/bugs/BUG-0001-volumes-data-project-antigravity-connectai-프로젝트-코드-리뷰-해줄-수-있.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 프로젝트 코드 리뷰 해줄 수 있어? 개선할 부분이 있는지, 그러고...
- `docs/records/ConnectAI/bugs/BUG-0002-지금-내가-분석-요청하고-너가-답을-줄때-아래-템플릿에-맞춰-답을-써주고-있는데-개선-포인트가-있는지-확인해.md` (16 lines) — Bug: 지금 내가 분석 요청하고 너가 답을 줄때 아래 템플릿에 맞춰 답을 써주고 있는데, 개선 포인트가 있는지 확인해줘. ## 내가 보는 위험 가장 큰...
- `docs/records/ConnectAI/bugs/BUG-0003-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
- `docs/records/ConnectAI/bugs/BUG-0004-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
- `docs/records/ConnectAI/bugs/BUG-0005-다시한번-답줘-volumes-data-project-antigravity-connectai-내-질문에-대한-.md` (16 lines) — Bug: 다시한번 답줘. /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는...
- `docs/records/ConnectAI/bugs/BUG-0006-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
- `docs/records/ConnectAI/bugs/BUG-0007-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
- `docs/records/ConnectAI/bugs/BUG-0008-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
- `docs/records/ConnectAI/bugs/BUG-0009-문제점을-읽고-어떻게-개선하는게-최선인지-분석해주면-좋겠어-알겠습니다-지금부터-connectai-프로젝트-에.md` (16 lines) — Bug: 문제점을 읽고 어떻게 개선하는게 최선인지 분석해주면 좋겠어. 알겠습니다. 지금부터 ConnectAI 프로젝트에만 완전히 집중하겠습니다. ...
- `docs/records/ConnectAI/bugs/BUG-0010-문제점을-읽고-어떻게-개선하는게-최선인지-분석해주면-좋겠어-알겠습니다-지금부터-connectai-프로젝트-에.md` (16 lines) — Bug: 문제점을 읽고 어떻게 개선하는게 최선인지 분석해주면 좋겠어. 알겠습니다. 지금부터 ConnectAI 프로젝트에만 완전히 집중하겠습니다. ...
- `docs/records/ConnectAI/bugs/BUG-0011-문제점을-읽고-어떻게-개선하는게-최선인지-분석해주면-좋겠어-알겠습니다-지금부터-connectai-프로젝트-에.md` (16 lines) — Bug: 문제점을 읽고 어떻게 개선하는게 최선인지 분석해주면 좋겠어. 알겠습니다. 지금부터 ConnectAI 프로젝트에만 완전히 집중하겠습니다. ...
+- `docs/records/ConnectAI/README.md` (18 lines) — Astra Project Chronicle Records
+- `docs/records/ConnectAI/bugs/BUG-0001-volumes-data-project-antigravity-connectai-프로젝트-코드-리뷰-해줄-수-있.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 프로젝트 코드 리뷰 해줄 수 있어? 개선할 부분이 있는지, 그러고...
+- `docs/records/ConnectAI/bugs/BUG-0002-지금-내가-분석-요청하고-너가-답을-줄때-아래-템플릿에-맞춰-답을-써주고-있는데-개선-포인트가-있는지-확인해.md` (16 lines) — Bug: 지금 내가 분석 요청하고 너가 답을 줄때 아래 템플릿에 맞춰 답을 써주고 있는데, 개선 포인트가 있는지 확인해줘. ## 내가 보는 위험 가장 큰...
+- `docs/records/ConnectAI/bugs/BUG-0003-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
+- `docs/records/ConnectAI/bugs/BUG-0004-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
+- `docs/records/ConnectAI/bugs/BUG-0005-다시한번-답줘-volumes-data-project-antigravity-connectai-내-질문에-대한-.md` (16 lines) — Bug: 다시한번 답줘. /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는...
+- `docs/records/ConnectAI/bugs/BUG-0006-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
+- `docs/records/ConnectAI/bugs/BUG-0007-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
+- `docs/records/ConnectAI/bugs/BUG-0008-volumes-data-project-antigravity-connectai-내-질문에-대한-답변이-잘-정리.md` (16 lines) — Bug: /Volumes/Data/project/Antigravity/ConnectAI 내 질문에 대한 답변이 잘 정리되서 알려주긴 하는데 focused...
+- `docs/records/ConnectAI/bugs/BUG-0009-문제점을-읽고-어떻게-개선하는게-최선인지-분석해주면-좋겠어-알겠습니다-지금부터-connectai-프로젝트-에.md` (16 lines) — Bug: 문제점을 읽고 어떻게 개선하는게 최선인지 분석해주면 좋겠어. 알겠습니다. 지금부터 ConnectAI 프로젝트에만 완전히 집중하겠습니다. ...

 ## VS Code Extension Surface
 - **Extension ID**: `g1nation.astra`
@@ -227,7 +225,7 @@ flowchart LR
  - `g1nation.calendar.refresh` — Astra: Google Calendar 새로고침 📅
  - `g1nation.calendar.connectOAuth` — Astra: Google Calendar OAuth 연결 (쓰기) 🔐
  - `g1nation.devilAgent.toggle` — Astra: Toggle Devil Agent 🎭
- **Configuration** (69 settings):
+- **Configuration** (87 settings):
  - `g1nation.multiAgentEnabled` *(boolean)* _(default: `false`)_ — Enable Multi-Agent Workflow (Planner -> Researcher -> Writer) for complex tasks.
  - `g1nation.datacollectBridgeUrl` *(string)* _(default: `"http://127.0.0.1:3002"`)_ — Wiki/Datacollect MCP Bridge URL. /research, /benchmark, /youtube chat slash commands route here. The Bridge must be running (`npm run bridge` in the Datacollect project).
  - `g1nation.datacollectSavePath` *(string)* _(default: `""`)_
@@ -253,6 +251,18 @@ flowchart LR
  - `g1nation.finalOnlyRetryOnThoughtLeak` *(boolean)* _(default: `true`)_ — If the model emits only hidden reasoning (<think>, <|channel|>thought, "Thinking Process:" …) and no user-visible answer, Astra silently re-asks it for the final answer only. Hidden reasoning is never
  - `g1nation.lmStudio.idleTimeoutMs` *(number)* _(default: `300000`)_ — Auto-eject the loaded LM Studio model after this many milliseconds of inactivity. Set to 0 to disable. Default: 300000 (5 minutes).
  - `g1nation.lmStudio.autoLoadOnSelect` *(boolean)* _(default: `true`)_ — Automatically load LM Studio models into memory when selected from the Astra sidebar.
+  - `g1nation.lmStudio.sampling.topP` *(number)* _(default: `0.9`)_ — Nucleus sampling cutoff. Small / quantized models often spew wrong-neighbour tokens (한글 깨짐: 붕괴→붕점) when the tail is wide. Lower (0.8–0.9) tightens; 1.0 disables. Applied to both SDK and REST paths.
+  - `g1nation.lmStudio.sampling.topK` *(number)* _(default: `20`)_ — Top-K sampling cutoff. 0 disables. Default 20 — tighter for small models, raise to 40–80 for large models that already sample well.
+  - `g1nation.lmStudio.sampling.minP` *(number)* _(default: `0.05`)_ — Min-P floor — discards tokens with probability below this fraction of the top token. Good defence against rare-token glitches. 0 disables.
+  - `g1nation.lmStudio.sampling.repeatPenalty` *(number)* _(default: `1.1`)_ — Repeat / frequency penalty to curb stutter (것입니다서입니다…). 1.0 disables. Values 1.05–1.2 are typical.
+  - `g1nation.lmStudio.statsInBudget` *(boolean)* _(default: `true`)_ — Show token/s and time-to-first-token from LM Studio prediction stats in the context-budget badge after each turn (SDK path only).
+  - `g1nation.lmStudio.draftModel` *(string)* _(default: `""`)_ — [Speculative decoding] LM Studio model key of a small draft model (e.g. 'gemma-2b-it') used to accelerate the main model. Empty disables. 1.5–3x throughput on large models. The draft must be downloade
+  - `g1nation.lmStudio.load.flashAttention` *(boolean)* _(default: `true`)_ — [Load option] Enable Flash Attention when loading models. Faster generation + lower memory on compatible hardware, especially helpful for long contexts. Default: true.
+  - `g1nation.lmStudio.load.gpuOffloadRatio` *(string)* _(default: `"max"`)_ — [Load option] How much of the model to offload to GPU. 'max' = all (default), 'off' = CPU only, or a number 0–1 (e.g. '0.5' = half). Numeric strings are parsed.
+  - `g1nation.lmStudio.load.offloadKVCacheToGpu` *(boolean)* _(default: `true`)_ — [Load option] Keep KV cache on GPU memory. Faster but requires VRAM headroom. Default: true.
+  - `g1nation.lmStudio.load.keepModelInMemory` *(boolean)* _(default: `true`)_ — [Load option] Prevent the model from being swapped out of system memory. Improves interactive responsiveness; raises RAM use. Default: true.
+  - `g1nation.lmStudio.load.useFp16ForKVCache` *(boolean)* _(default: `false`)_ — [Load option] Store KV cache in FP16 (halves cache memory). Tiny quality impact for most models — try if you run out of VRAM at long contexts. Default: false.
+  - `g1nation.lmStudio.load.evalBatchSize` *(number)* _(default: `0`)_ — [Load option] Token batch size during evaluation. 0 = engine default. Higher (512–1024) improves prefill speed on GPU at the cost of memory.
  - `g1nation.localBrainPath` *(string)* _(default: `""`)_ — Folder path for your local Second Brain knowledge base. Leave empty to use the default folder.
  - `g1nation.brainProfiles` *(array)* _(default: `[]`)_ — Multiple brain profiles. Each item supports id, name, localBrainPath, secondBrainRepo, and description.
  - `g1nation.activeBrainId` *(string)* _(default: `""`)_ — Active brain profile id used for the current chat context.
@@ -274,21 +284,9 @@ flowchart LR
  - `g1nation.knowledgeMix.secondBrainWeight` *(number)* _(default: `50`)_ — Knowledge Mix (0–100): how heavily the assistant should lean on Second Brain evidence vs. its own general knowledge. 0 = Second Brain disabled (model knowledge only). 50 = balanced (legacy default). 1
  - `g1nation.enableReflection` *(boolean)* _(default: `true`)_ — Insert a Self-Reflection (Reflector) stage between Researcher and Writer in the multi-agent workflow. The Reflector critically reviews the plan and research output (gaps, contradictions, unsupported c
  - `g1nation.autoLessonFromReflection` *(boolean)* _(default: `true`)_ — Persist substantive Reflector critiques to the active brain as lesson cards under `lessons/auto-reflector/`. Future missions automatically retrieve these cards (via the existing Experience-Memory pipe
-  - `g1nation.company.intentClassifierModel` *(string)* _(default: `""`)_ — Model used to classify whether an incoming chat message in 1인 기업 모드 is a (a) casual chat / question, (b) follow-up on the previous round, or (c) a brand-new task that should trigger the full work pipe
-  - `g1nation.company.disableIntentClassifier` *(boolean)* _(default: `false`)_ — Bypass the intent classifier and always run the full work pipeline on every chat message in 1인 기업 모드 (legacy behaviour). Enable this only if you want every input — including 'thanks', 'show me X again
-  - `g1nation.company.autoSelectPipeline` *(boolean)* _(default: `true`)_ — Let the intent classifier *automatically switch* to the pipeline it recommends for this turn (e.g. short '기획서까지만' for a planning ask, full '풀 프로덕트' for an end-to-end product). Your explicitly-activate
-  - `g1nation.company.intentAlignmentMode` *(string)* _(default: `"smart"`)_ — Intent Alignment — turn user prompts into an explicit Requirement Contract (C-G-C-F-Q) before dispatching a pipeline. 'off' = legacy, pipeline runs immediately. 'smart' (default) = run when confidence
-  - `g1nation.company.intentAlignmentMaxRounds` *(number)* _(default: `3`)_ — Maximum back-and-forth rounds the Intent Alignment analyzer is allowed to ask before forcing a 'confirm or cancel' card (it stops asking new questions and shows the current contract for user approval)
-  - `g1nation.selfReflector.enabled` *(boolean)* _(default: `false`)_ — Self-Reflector Phase A — append a [Self-Reflector Check] block at the end of every substantive LLM answer (Consistency / Completeness / Accuracy, plus References / Paths for code answers). Zero extra 
-  - `g1nation.selfReflector.externalVerification` *(boolean)* _(default: `false`)_ — Self-Reflector Phase B — after every 1인 기업 specialist response, run a *separate* LLM call to verify the output from an outside-context perspective (catches the 'same model self-validates' blind spot).
-  - `g1nation.selfReflector.executionVerification` *(boolean)* _(default: `false`)_ — Self-Reflector Phase C — after a code file is created via <create_file>, automatically run the language's syntax check (Python: py_compile, JS: node --check, TS: project tsc --noEmit). Failures are su
-  - `g1nation.company.pixelOffice.enabled` *(boolean)* _(default: `true`)_ — Show the Pixel Office visualisation panel above the chat — a small pixel-office-style display that mirrors the agent's current pipeline status (analyzing, need_clarification, executing, reviewing, wai
-  - `g1nation.company.pixelOffice.bubbles` *(boolean)* _(default: `true`)_ — Show short comic-style speech bubbles above the Pixel Office character on status changes / key events (e.g. '코드 들어간다', '잠깐, 이건 다시 보자', '좋아, 끝났다!'). Bubbles are purely narrative — they never influence 
-  - `g1nation.google.clientId` *(string)* _(default: `""`)_
-  - `g1nation.google.clientSecret` *(string)* _(default: `""`)_
-  - `g1nation.google.calendarId` *(string)* _(default: `"primary"`)_
-  - `g1nation.google.defaultEventDurationMinutes` *(number)* _(default: `60`)_ — end / duration 둘 다 없는 일정의 기본 길이 (분). agent 가 회의록에서 시각만 추출하고 종료 시각은 명시 안 했을 때 적용.
-  - _…and 9 more_
+  - `g1nation.workflow.synthesizerEnabled` *(boolean)* _(default: `true`)_
+  - `g1nation.workflow.multiAgentMode` *(string)* _(default: `"auto"`)_
+  - _…and 27 more_

 ## Dependencies
 - **Runtime** (2): `@lmstudio/sdk`, `pdf-parse`
@@ -336,7 +334,7 @@ Astra는 대표님의 명시적인 승인 하에 로컬 시스템의 강력한
 **Designed for High-Performance Decision Making.**
 Copyright (C) **g1nation**. All rights reserved.

-_Last auto-scan: 2026-05-22T10:04:22.779Z · signature `1c723399`_
+_Last auto-scan: 2026-05-23T06:46:38.895Z · signature `457ea57e`_
 <!-- ASTRA:AUTO-END -->

 ## Purpose
@@ -1,5 +1,5 @@
 {
  "result": "Final report with inconsistencies. This should be long enough to pass validation.",
-  "createdAt": 1779495116625,
+  "createdAt": 1779518828393,
  "modelVersion": "unknown"
 }
@@ -1,5 +1,5 @@
 {
  "result": "[CONFLICT WARNING] 성능이 200% 증가했습니다. vs 그러나 동시에 50% 감소했습니다. 최적화와 성능 저하가 동시에 발견됨.",
-  "createdAt": 1779495116625,
+  "createdAt": 1779518828393,
  "modelVersion": "unknown"
 }
@@ -1,5 +1,5 @@
 {
  "result": "Detailed Execution Plan: 1. Research 2. Analyze 3. Write report with high quality.",
-  "createdAt": 1779495116624,
+  "createdAt": 1779518828392,
  "modelVersion": "unknown"
 }
@@ -1,5 +1,5 @@
 {
-  "result": "---\nid: stress_conflict_1779495116612\ndate: 2026-05-23T00:11:56.625Z\ntype: knowledge_artifact\nstandard: P-Reinforce v3.0\ntags: [automated, connect_ai, brain_sync]\n---\n\n## 📌 Brief Summary\nFinal report with inconsistencies. This should be long enough to pass validation.\n\nFinal report with inconsistencies. This should be long enough to pass validation.\n\n---\n## 💡 Astra의 선제적 제안 (Proactive Next Actions)\nFinal report with inconsistencies. This should be long enough to pass validation.\n---\n## 🛡️ Reliability & Audit Summary\n> [!NOTE]\n> 이 문서는 ConnectAI의 **Intelligent Resilience** 엔진에 의해 검증 및 정제되었습니다.\n\n| Metric | Value | Status |\n| :--- | :--- | :--- |\n| **Conflict Risk** | `60/100` | ⚠️ Medium |\n| **Fallbacks Used** | `0` | ✅ None |\n| **Auto Retries** | `0` | ✅ Stable |\n| **Deduplication** | `0` | Standard |\n| **Processing Time** | `0.0s` | ✅ Fast |\n\n### 🔍 Decision Audit Trail\n- **[PLANNER]** 전략 수립 중... (11ms)\n- **[RESEARCHER]** 핵심 정보 수집 및 분석 중... (0ms)\n- **[WRITER]** 최종 리포트 작성 및 편집 중... (1ms)\n",
-  "createdAt": 1779495116625,
+  "result": "---\nid: stress_conflict_1779518828380\ndate: 2026-05-23T06:47:08.394Z\ntype: knowledge_artifact\nstandard: P-Reinforce v3.0\ntags: [automated, connect_ai, brain_sync]\n---\n\n## 📌 Brief Summary\nFinal report with inconsistencies. This should be long enough to pass validation.\n\nFinal report with inconsistencies. This should be long enough to pass validation.\n\n---\n## 💡 Astra의 선제적 제안 (Proactive Next Actions)\nFinal report with inconsistencies. This should be long enough to pass validation.\n---\n## 🛡️ Reliability & Audit Summary\n> [!NOTE]\n> 이 문서는 ConnectAI의 **Intelligent Resilience** 엔진에 의해 검증 및 정제되었습니다.\n\n| Metric | Value | Status |\n| :--- | :--- | :--- |\n| **Conflict Risk** | `60/100` | ⚠️ Medium |\n| **Fallbacks Used** | `0` | ✅ None |\n| **Auto Retries** | `0` | ✅ Stable |\n| **Deduplication** | `0` | Standard |\n| **Processing Time** | `0.0s` | ✅ Fast |\n\n### 🔍 Decision Audit Trail\n- **[PLANNER]** 전략 수립 중... (12ms)\n- **[RESEARCHER]** 핵심 정보 수집 및 분석 중... (1ms)\n- **[WRITER]** 최종 리포트 작성 및 편집 중... (0ms)\n",
+  "createdAt": 1779518828394,
  "modelVersion": "unknown"
 }
@@ -1,8 +1,8 @@
 {
-  "missionId": "stress_conflict_1779495116612",
+  "missionId": "stress_conflict_1779518828380",
  "status": "completed",
-  "startTime": "2026-05-23T00:11:56.613Z",
-  "totalElapsedMs": 13,
+  "startTime": "2026-05-23T06:47:08.380Z",
+  "totalElapsedMs": 14,
  "results": {
    "planner": "Detailed Execution Plan: 1. Research 2. Analyze 3. Write report with high quality.",
    "researcher": "[CONFLICT WARNING] 성능이 200% 증가했습니다. vs 그러나 동시에 50% 감소했습니다. 최적화와 성능 저하가 동시에 발견됨.",
@@ -16,30 +16,30 @@
    {
      "from": "idle",
      "to": "planner",
-      "durationMs": 11,
+      "durationMs": 12,
      "message": "전략 수립 중...",
-      "ts": "2026-05-23T00:11:56.624Z"
+      "ts": "2026-05-23T06:47:08.392Z"
    },
    {
      "from": "planner",
      "to": "researcher",
-      "durationMs": 0,
+      "durationMs": 1,
      "message": "핵심 정보 수집 및 분석 중...",
-      "ts": "2026-05-23T00:11:56.624Z"
+      "ts": "2026-05-23T06:47:08.393Z"
    },
    {
      "from": "researcher",
      "to": "writer",
-      "durationMs": 1,
+      "durationMs": 0,
      "message": "최종 리포트 작성 및 편집 중...",
-      "ts": "2026-05-23T00:11:56.625Z"
+      "ts": "2026-05-23T06:47:08.393Z"
    },
    {
      "from": "writer",
      "to": "completed",
      "durationMs": 1,
      "message": "미션 완료",
-      "ts": "2026-05-23T00:11:56.626Z"
+      "ts": "2026-05-23T06:47:08.394Z"
    }
  ],
  "resilienceMetrics": {
@@ -1,5 +1,145 @@
 # Astra Patch Notes

+## v2.2.72 (2026-05-23)
+### ⚡ LM Studio 통신 hardening + 속도 부스트 팩
+한 turn 에 두 라운드 작업을 묶었다. (A) 통신 경로 안전망 9건 + (B) 속도 개선 4건.
+
+**(A) 통신 hardening**
+- **Sampling parity SDK ↔ REST.** 기존엔 `topP/topK/minP/repeatPenalty` 가 SDK 경로에만 적용 → 핸들이 죽어 REST 로 fallback 되면 한글 토큰 깨짐(`붕괴→붕점`) 재발. 공유 `LmStudioSampling` + `samplingToRestBody` 로 두 경로가 동일 값을 보내도록 통일. Ollama 도 `options.{top_p,top_k,min_p,repeat_penalty}` 로 같이 받음.
+- **신규 설정:** `g1nation.lmStudio.sampling.{topP=0.9, topK=20, minP=0.05, repeatPenalty=1.1}` + `g1nation.lmStudio.statsInBudget=true`.
+- **prediction.stats UI 노출.** 매 turn 끝나면 ctx-badge 에 `… · 32.1 tok/s · TTFT 0.40s` 표시 (SDK 경로만). 툴팁에 출력 토큰 수 / 총 시간 / stopReason 도 추가.
+- **listDownloaded TTL 캐시 (60s).** 사이드바 드롭다운 열 때마다 LM Studio 디스크 walk 하던 호출을 캐싱. 빈 결과는 캐싱 안 함 (방금 켠 LM Studio 가리는 회귀 방지). `setBaseUrl` / `invalidateCaches()` 가 캐시 초기화.
+- **Empty-response 복구 일원화.** `LMStudioStreamer.stream()` 의 attempt-2 retry 가 dead-handle 에러뿐 아니라 "에러 없이 0 token" 케이스도 다룸. agent.ts 의 중복된 handle-reset retry 블록 (~30 LOC) 삭제. REST fallback 은 유지.
+- **handle-dead 패턴 확장.** `channel closed`, `WebSocket (is not open|closed|disconnected)`, `Connection (lost|reset|closed)`, `ECONNRESET`, `socket hang up` 추가.
+- **`httpToWebSocketUrl` path 정리.** `/api/v0`, `/api/v1`, `/v1`, `/api` 를 loop 으로 unwind — `http://host/api/v0` → ws root 까지 한 번에.
+- **service-down 조기 break.** `createStreamingRequest` 가 `error.cause.code === 'ECONNREFUSED' | 'ENOTFOUND' | 'EAI_AGAIN'` 감지 시 attempt/variant/candidate 루프 즉시 종료. 12회 fetch → 1회 → 사용자 에러 ~1s.
+- **callAgent cutoff warn.** sub-agent SDK 호출도 `stopReason` 검사 → `/maxPredicted|context|truncat/` 매칭 시 logError. 잘린 specialist 출력이 silently pipeline 을 오염시키는 거 방지.
+
+**(B) 속도 부스트**
+- **Speculative decoding.** 신규 설정 `g1nation.lmStudio.draftModel` (빈 값 = OFF). 작은 draft 모델 (e.g. `gemma-2b-it`) 지정 시 large 모델 1.5~3× throughput. `ChatStreamRequest.draftModel` → SDK `respond({draftModel})`. main 모델 load 직후 lifecycle 이 `unstable_preloadDraftModel` 호출해 cold-load 비용 제거. ctx-badge 에 `spec 68%` accept-ratio 표시 (60%+ healthy, 30% 미만 = draft 가 잘못 골라 오히려 느려질 수 있음).
+- **Load-time 옵션 (8개).** `client.load()` 가 `LMStudioLoadConfig` 받아 `LLMLoadModelConfig` (GPUSetting 래퍼 포함) 으로 SDK 에 전달.
+  - `g1nation.lmStudio.load.flashAttention` (true) — long-context 10~20%
+  - `g1nation.lmStudio.load.gpuOffloadRatio` ("max" | "off" | 0-1)
+  - `g1nation.lmStudio.load.offloadKVCacheToGpu` (true)
+  - `g1nation.lmStudio.load.keepModelInMemory` (true) — swap-out 방지
+  - `g1nation.lmStudio.load.useFp16ForKVCache` (false) — KV-cache 메모리 반감 (VRAM 빠듯할 때)
+  - `g1nation.lmStudio.load.evalBatchSize` (0 = 엔진 default) — prefill 속도
+- **`liveStreamTokens` 기본 → true (← false).** TTFT 체감 향상 — sanitize + `streamReplace` 가 생성 끝에 최종 답변으로 교체하므로 control token 노출은 잠깐만 가능. (memory 의 "sanitize-before-post handles the leak" 가 근거)
+
+**시스템 프롬프트 KV-cache (item 1+5)**: 조사 결과 현재 ordering (stable head → `[CONTEXT]` body → stable tail) 이 이미 prefix-cache 친화적이라 코드 변경 없음. 단, agent persona-first 배치는 small-model anchoring 우선으로 유지 — KV-cache 잠재 이득보다 답변 품질이 우선.
+
+**Touched:** `src/{config,agent,extension}.ts`, `src/lmstudio/{client,streamer,lifecycleManager}.ts`, `media/sidebar.js`, `package.json` (12개 신규 설정 + `liveStreamTokens` default flip), 2개 test FakeClient 에 `listDownloadedCached` stub. 401/401 jest 통과 · tsc clean · esbuild 2.9MB.
+
+**신규 패키징:** `astra-2.2.72.vsix`.
+
+---
+
+## v2.2.71 (2026-05-23)
+### 📦 자동 기록 줄 전체를 도구 ▾ 메뉴 안으로 이동
+- **요청 명확화:** v2.2.70 에서 토글만 도구 메뉴에 추가했는데, 사용자가 원했던 건 사이드바 하단의 records-line ("● 자동 기록 · filename · 기록 ▾") **줄 전체** 를 도구 메뉴 안으로 옮기는 것.
+- **수정:** `media/sidebar.html` 의 `<div class="records-line">` 통째로 삭제. 그 안에 있던 모든 요소 (자동 기록 status, recordsLatest, chronicleRecordSel, openChronicleRecordBtn, refreshChronicleRecordsBtn, openDesignerBtn) 를 도구 ▾ 드롭다운 안으로 흡수. 도구 메뉴는 `hdr-menu-wide` 로 변경.
+- **새 메뉴 레이아웃:** 도구 ▾ 클릭 시
+  - "자동 기록" 섹션 — 토글 / 최근 저장 기록 status / 기록 selector / 선택 기록 열기 / 새로고침 / 폴더 열기
+  - "도구" 섹션 — 근거 추적 JSON 보기 / 원본 답변을 두뇌에 저장 / 두뇌 동기화
+- **JS 안전성:** 모든 element ID 는 유지했으므로 기존 sidebar.js 핸들러는 그대로 작동. `renderChronicleAutoToggle` 만 살짝 수정 — chronicleAutoStatus 가 이제 자식 (status-dot · recordsLatest) 을 가진 컨테이너라 `textContent` 직접 할당이 자식을 지우는 회귀를 막기 위해 opacity / title 만 갱신하도록 변경.
+- **사이드바 공간:** records-line 줄 한 줄 제거 → context bar 와 chat 본문 사이 noise 감소.
+- **신규 패키징:** `astra-2.2.71.vsix`.
+
+---
+
+## v2.2.70 (2026-05-23)
+### 🎚 자동 기록 On/Off 토글 (도구 메뉴)
+- **추가:** 사이드바 records-line 의 "자동 기록" 표시를 끄고 켤 수 있는 토글. 위치는 **도구 ▾** 드롭다운 첫 항목 (`자동 기록: 켜짐 / 꺼짐`).
+- **신규 설정:** `g1nation.chronicleAutoRecord` (기본 `true`). 토글 클릭 시 즉시 VS Code Global 설정에 영구 저장 — 다음 세션에도 유지.
+- **gating:** `_autoWriteChronicleAfterPrompt` 진입 시 `getConfig().chronicleAutoRecord === false` 면 early return. 자동 저장만 멈추고, 수동 기록 (도구 메뉴의 다른 기록 항목, `/wiki` 등) 은 그대로 동작.
+- **UI 피드백:** OFF 일 때 records-line 의 "자동 기록" 라벨이 "자동 기록 (꺼짐)" + dim opacity, 최근 기록 라벨도 dim 처리해 한눈에 상태 파악. 클릭 반응성을 위해 낙관적 갱신 후 서버 응답으로 보정.
+- **메시지:** 신규 webview ↔ extension 메시지 `setChronicleAutoRecord` / `getChronicleAutoRecord` / `chronicleAutoRecordStatus`. `chronicleHandlers.ts` 에서 라우팅.
+- **신규 패키징:** `astra-2.2.70.vsix`.
+
+---
+
+## v2.2.69 (2026-05-23)
+### 💾 대화 히스토리 — 슬라이딩 윈도우 + 모드 전환 bridge
+- **현황 확인:** "히스토리 전역 단일 관리" 요구는 이미 충족 — `AgentExecutor.chatHistory` 단일 인스턴스이며 에이전트/회사/멀티에이전트 모드 전환은 history 를 비우지 않는다. 명시적 `/newChat` 이나 세션 삭제 때만 reset.
+- **수정 1 — sliding window 요약.** 기존 `trimHistoryToBudget` 은 오래된 메시지를 단순히 `[이전 대화 N개 ... 생략됨]` count 마커로 대체 → 모델이 "이전에 무슨 얘기를 했는지" 모름. 이제는 dropped 메시지 배열을 marker factory 로 같이 넘기고, agent.ts `buildDroppedHistorySummary()` 가 추가 LLM 호출 없이 heuristic 으로 (a) 각 user prompt 첫 문장 (b) 각 assistant 답변 첫 문장 (R1 conclusion-first 가정) 만 뽑아 `U1: ... / A1: ... / U2: ...` 형식의 한 system 메시지로 압축. 8턴 이상이면 가장 오래된 절반은 한 줄로 축약.
+- **수정 2 — 모드 전환 bridge.** `AgentExecutor._lastModeSignature` 로 (agent skill, multiAgent, company mode, brain) 의 해시를 추적. handlePrompt 진입 시 직전 값과 다르면 system prompt 에 `[MODE TRANSITION BRIDGE] 이전 모드 / 현재 모드 / 직전 대화 주제` 한 블록을 끼움. chatHistory 는 그대로라 사용자에겐 대화 연속이고, 모델은 새 페르소나/포맷을 따르면서도 직전 맥락을 잊지 않는다. `clearHistory` / `resetConversation` 에서 signature 도 함께 초기화해 새 세션 첫 메시지에 spurious bridge 가 끼지 않게 함.
+- **시그니처 변경:** `trimHistoryToBudget` 의 `makeMarker` 가 `(droppedCount, droppedMessages)` 두 인자를 받는다. 호출부 (`agent.ts`) 와 단위 테스트 (`contextManager.test.ts`) 갱신.
+- **신규 패키징:** `astra-2.2.69.vsix`.
+
+---
+
+## v2.2.68 (2026-05-23)
+### 📐 답변 형식 — 7개 hard rules
+- **변경:** 답변 포맷 규칙을 사용자가 지정한 7개 hard rule 로 전면 교체. 기존 "긴 답변엔 핵심 요약 블록 + 상세 설명 + 제안" 의 3-section 템플릿은 폐기.
+  - R1. 첫 문장에 결론 (no 인삿말, no "분석해보겠습니다", no "핵심 요약" 라벨)
+  - R2. 섹션 최대 3개
+  - R3. 같은 내용을 두 번 말하지 않는다
+  - R4. 볼드는 전체 답변에서 3개 이하
+  - R5. 추가 정보 없이 판단 가능하면 바로 실행
+  - R6. 질문은 1개만 — (a) 방향이 두 갈래로 갈리고 사용자 의도를 알 수 없을 때, 또는 (b) 비가역 작업 직전
+  - R7. 추측 가능하면 추측+실행하되 가정 한 줄 명시 ("가정: ...")
+- **적용 범위:** 단일 에이전트 경로 (`BASE_SYSTEM_PROMPT [OUTPUT FORMAT]`) + 멀티 에이전트 최종 단계 (`SynthesizerAgent` persona) 양쪽에 동일 규칙 주입. 두 경로 모두 같은 형식으로 답변하도록 보장.
+- **부수 정리:** `[FOLLOW-UP QUESTION RULES]` 섹션은 R6 에 흡수돼 제거. `[ENGINEERING STANCE]` 의 "Give the verdict first, then explain tradeoffs" 도 R1 과 중복이라 정리.
+- **신규 패키징:** `astra-2.2.68.vsix`.
+
+---
+
+## v2.2.67 (2026-05-23)
+### 🧠 두뇌 추가/수정/삭제 정상화
+- **문제:** v2.2.66 이후 신고된 두 가지 회귀.
+  1. 두뇌 "추가" 후 dropdown 에 새 brain 이 들어가긴 하는데 선택된 표시는 마지막 옵션인 `+ Add New Brain...` 으로 굳어버림 → 사용자에겐 "이름이 add new brain 으로 바뀜" 으로 보임.
+  2. 그 상태에서 "수정"/"삭제" 버튼이 침묵 — 클릭해도 아무 일 안 일어남.
+- **원인 1 — selected 적용 순서:** brainProfiles 핸들러가 `option.selected = true` 를 `appendChild` 직전에 거는 패턴이라 일부 Chromium webview 가 무시. 결과적으로 dropdown selectedIndex 가 마지막 옵션 (`+ Add New Brain...`) 에 머무름.
+- **원인 2 — `brainSel.value === 'new'` 잠금:** 사용자가 dropdown 의 `+ Add New Brain...` 옵션을 직접 클릭하거나, 폴더 선택 모달을 취소하면 `brainSel.value` 가 `'new'` 로 굳어버림. 수정/삭제 버튼 onclick 첫 줄 `if (brainSel.value === 'new') return;` 가 silently early-return → 수정 안 됨.
+- **수정 1 — selected 적용 순서 변경.** 옵션을 다 넣은 *후에* `brainSel.value = activeBrainId` 로 한 번에 selection 적용. appendChild 전 `o.selected = true` 패턴 제거. 이전에 `'new'` 로 굳어있던 값도 확실히 덮어쓴다.
+- **수정 2 — 'new' 클릭 즉시 복원.** `brainSel.onchange` 가 `'new'` 를 감지하면 `addBrain` 메시지를 보냄과 *동시에* dropdown 을 직전 유효 선택(`brainSel.dataset.lastSelected`)으로 즉시 되돌림. 사용자가 폴더 선택을 취소해도 dropdown 이 `'new'` 로 굳지 않는다.
+- **수정 3 — Edit/Delete 폴백.** 두 버튼이 만약 dropdown 이 `'new'` 인 순간에도 작동하도록 직전 유효 선택 또는 첫 실제 옵션으로 폴백. 더 이상 silent early-return 없음.
+- **신규 패키징:** `astra-2.2.67.vsix`.
+
+---
+
+## v2.2.66 (2026-05-23)
+### 🧠 두뇌(지식 폴더) 드롭다운 회귀 수정
+- **문제:** 사이드바 컨텍스트 바의 두뇌 선택자가 갑자기 `+ Add New Brain...` 하나만 보이는 상태. readyBar 에는 `Brain 5407` 처럼 brain 자체는 정상 인식되는데 dropdown 만 빔.
+- **원인 추정:** webview 의 `ready` 핸드셰이크 체인 (`_sendBrainStatus → _sendBrainProfiles → _sendSessionList → _sendModels → _sendChronicleProjects → _restoreActiveSessionIntoView → _sendReadyStatus`) 도중 한 단계가 throw 하면 그 뒤가 통째로 안 도는 구조. 또는 빈 profiles 배열 메시지가 한 번이라도 도착하면 기존 dropdown 옵션을 그대로 비워버리는 핸들러.
+- **수정 1 — sidebar.js 방어:** `brainProfiles` 메시지의 `profiles` 가 빈 배열/undefined 이면 기존 dropdown 옵션을 보존하고 warn 로그만 남긴다. 잘못된 상태로 옵션을 0개로 만들어 `+ Add New Brain...` 만 남기는 회귀 차단. `case` 블록을 `{...}` scope 로 감싸 향후 const 명 충돌도 예방.
+- **수정 2 — 초기 setup 이중 보장:** sidebarProvider.ts view 생성 시점에 `_restoreActiveSessionIntoView` + `_sendReadyStatus` 외에 `_sendBrainProfiles` / `_sendAgentsList` / `_sendModels` 도 직접 한 번 푸시. 'ready' 체인이 깨져도 dropdown 은 살아 있음.
+- **수정 3 — 진단 로그:** `_sendBrainProfiles` 가 호출될 때마다 `profiles=N activeBrainId=X` 를 logInfo. 재발 시 Output → Astra 채널만 보면 원인 즉시 판별.
+- **신규 패키징:** `astra-2.2.66.vsix`.
+
+---
+
+## v2.2.65 (2026-05-23)
+### 🧼 마크다운 마커 2차 sanitize — enforcer 재주입까지 차단
+- **문제:** v2.2.64 에서 `stripMarkdownFormatting` 을 `cleanedVisible` (모델 raw 출력 직후) 에만 적용했는데, 그 이후 단계인 `enforceLocalPathReviewAnswer` 가 sanitize 된 답변 앞에 `## 경로 확인 결과` 헤더를 하드코딩으로 다시 prepend → 화면에는 마커가 그대로 노출. `## 간단 요약`, `## 강점`, `## 근거`, `## 다음 액션` 등 ~20곳에서 같은 패턴.
+- **수정:** `agent.ts` `finalAssistantContent` (webview / chatHistory 에 들어가는 진짜 최종 문자열) 단계에 sanitizer 2차 패스를 추가. 1차(모델 출력) + 2차(enforcer 출력) 이중 방어로 어떤 코드 경로에서 `##`/`**` 가 prepend 되어도 디스플레이 직전에 모두 벗겨진다.
+- **신규 패키징:** `astra-2.2.65.vsix`.
+
+---
+
+## v2.2.64 (2026-05-23)
+### 🪶 Plain-text 출력 + 긴 답변 강제 요약
+- **문제:** 작은 로컬 모델이 학습된 습관으로 `## 다음 한 수`, `**` 강조 마커를 그대로 노출. 어떤 답변엔 라벨만 있고 본론으로 곧장 들어가서 사용자가 현황을 빠르게 파악하기 어려움.
+- **수정 1 — 후처리 sanitizer.** `responseRecovery.ts` 에 `stripMarkdownFormatting(text)` 추가. 코드 블록/인라인 코드는 보존하고 줄 시작 `#{1,6}\s+` 헤더 마커, `**bold**` / `__bold__`, 단일 `*강조*`, blockquote `> `, asterisk 불릿 `* ` 만 제거. 라벨 텍스트(`핵심 요약`, `다음 한 수`)는 그대로 유지.
+- **수정 2 — Synthesizer 강제 규칙.** Synthesizer 페르소나에 `[FORMAT — PLAIN TEXT ONLY, NO MARKDOWN]` 블록과 `[STRUCTURE]` 블록 추가. **답변이 ~4문장 / ~400자 를 넘으면 반드시 `핵심 요약` 블록(2~4 bullets)을 답변 맨 앞에 넣는다.** 짧은 답변은 그대로.
+- **수정 3 — BASE_SYSTEM_PROMPT 정비.** 기존 `[OUTPUT FORMAT]` 의 `## 핵심 요약`, `## 상세 설명` 같은 마크다운 헤더 예시 → bare label (예: `핵심 요약`) 로 교체. `[STRICT GLOBAL RULES]` 에 `[NO MARKDOWN MARKERS]` 명문화. 단일 에이전트 경로도 plain text 출력.
+- **수정 4 — review-evaluation 가이드 정비.** `1. ## 한 줄 판단` 같은 markdown-prefix 라벨 → `1) 한 줄 판단` 으로 교체. 마크다운 마커가 prompt 단계에서 모델에 학습되지 않도록.
+- **수정 5 — Drafter 페르소나 정비.** 섹션 라벨도 plain text. Synthesizer가 받는 입력이 깨끗해야 최종 출력도 깨끗.
+- **양쪽 경로 적용.** 단일 에이전트(`agent.ts` line ~1189) + multi-agent (`finalReport` 직전) 두 곳 모두에서 `outputFormat === 'plain'` 이면 sanitizer 통과. `chatHistory` 에도 정제본만 저장 → 다음 턴 컨텍스트에서 마커가 재학습되지 않음.
+- **신규 설정:** `g1nation.outputFormat` (`plain` 기본 / `markdown` opt-out).
+
+### 🧩 5단계 파이프라인 (Planner → Researcher → Reflector → Drafter → Synthesizer) + 깔끔한 스트림
+- **문제:** 작은 로컬 모델(예: gemma 4 e2b/e4b)이 컨텍스트 한계 때문에 한 번에 답을 끝내려다 EOS/잘림 발생. 또 multi-agent 모드일 때 채팅 본문에 `> **[Planner]** ...` 같은 단계 메시지가 본문에 섞여 사용자에게 답답함. 일부 응답에서 `<|channel|>thought ...` 같은 control token이 짧게 노출.
+- **수정 1 — Synthesizer 단계 추가 (5번째).** Drafter(=기존 Writer)가 1차 초안만 빠르게 생성하고, **Synthesizer**가 작은 draft만 받아 도입 한 줄·섹션 흐름·결론을 정리. 입력이 가벼워 작은 모델로도 한 번에 처리 가능. 신규 클래스 `SynthesizerAgent` (`src/agents/factory.ts`), `AgentEngine` 생성자 4번째 파라미터, `PipelineStage` 에 `synthesizer` 추가.
+- **수정 2 — 자동 발동 확장.** 기존 트리거는 prompt > 180자 + 키워드일 때만 → 작은 모델일 때도 single-agent 로 가다 폭사. 신규 `g1nation.workflow.multiAgentMode` (`auto`/`always`/`off`) 에서 **`auto` 기본값:** (a) 모델 ≤4B (b) prompt 가 컨텍스트의 30% 이상 (c) "코드 리뷰/심층 분석/보고서" 등 키워드 (d) 사용자가 `multiAgentEnabled` 명시적 ON — 중 하나만 만족해도 5단계 발동. 인사·12자 미만 prompt 는 제외.
+- **수정 3 — 단계 메시지를 채팅 본문에서 분리.** 진행 상태(`> **[Researcher]** ...`)를 채팅 버블에 흘리던 코드 제거. 대신 신규 webview message `workflowStage` 가 사이드바 상단의 `statusLabel + thinkingBar` 한 줄에만 표시 → "생각 단계가 본문에 계속 보임" 답답함 제거. 라벨도 `① 계획 → ② 자료 수집 → ③ 자기 검증 → ④ 초안 작성 → ⑤ 최종 정리` 한국어 + 번호로 통일.
+- **수정 4 — 라이브 토큰 스트리밍 기본 OFF.** 신규 `g1nation.liveStreamTokens` (기본 `false`): 토큰을 내부에서만 누적하고 `extractVisibleFinal` sanitize 끝난 최종 답변만 한 번에 표시 → Harmony `<|channel|>thought`/`<think>` 마커가 잠시라도 화면에 노출되는 누설 원천 차단. `true` 로 두면 legacy 라이브 스트리밍 복원.
+- **신규 설정 4개:** `g1nation.workflow.synthesizerEnabled` (기본 true), `g1nation.workflow.multiAgentMode` (auto/always/off, 기본 auto), `g1nation.workflow.autoCtxFractionThreshold` (기본 0.30), `g1nation.liveStreamTokens` (기본 false).
+- **소프트 페일 보장:** Synthesizer가 빈 출력/예외를 내도 미션을 막지 않고 Drafter 초안을 그대로 최종 답변으로 사용. Reflector와 동일한 패턴.
+- **신규 패키징:** `astra-2.2.64.vsix`.
+
+---
+
 ## v2.2.63 (2026-05-22)
 ### 🎚️ 한국어 오타 최소화 — 채팅 Temperature 설정 + anti-glitch 샘플링
 - **문제:** 채팅 분석 답변에 한국어 오타(`붕괴`→`붕점`, `핵심`→`핵점`, `텍스트`→`텍록`)가 잦음. 토큰 단위 샘플링 glitch.
@@ -1,11 +1,11 @@
 {
  "projectId": "connectai",
-  "projectName": "connectai",
-  "projectRoot": "E:\\Wiki\\connectai",
-  "recordRoot": "E:\\Wiki\\connectai\\docs\\records\\connectai",
+  "projectName": "ConnectAI",
+  "projectRoot": "/Volumes/Data/project/Antigravity/ConnectAI",
+  "recordRoot": "/Volumes/Data/project/Antigravity/ConnectAI/docs/records/ConnectAI",
  "description": "Auto-created by Project Architecture activation.",
  "corePurpose": "",
  "detailLevel": "standard",
-  "createdAt": "2026-05-20T09:42:40.003Z",
-  "updatedAt": "2026-05-22T10:07:21.651Z"
+  "createdAt": "2026-05-23T03:51:11.620Z",
+  "updatedAt": "2026-05-23T06:48:11.444Z"
 }
@@ -0,0 +1,19 @@
+# ADR: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안...
+
+## Status
+accepted
+
+## Context
+/Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안을 줄 것이 잇는지 평가해줘.
+
+## Decision
+현재 프로젝트의 개선 방향은 여전히 '사용자 경험(UX)을 통한 복잡성 은폐'에 초점을 맞추고 있으며, 이를 위해 `src/extension.ts` 파일의 핵심 로직 분석이 필수적입니다. 파일 내용이 제공되는 대로 즉시 분석하여 다음 단계로 넘어가겠습니다. 이 파일을 통해 ConnectAI가 어떤 순서와 방식으로 작동하는지 파악해야만, 사용자에게 '마법'처럼 느껴지는 인터페이스를 설계할 수 있습니다. (시스템에서 `src/extension.ts` 파일 내용을 읽어오는 중입니다.)
+
+## Reason
+Captured automatically because the conversation contained decision-oriented language.
+
+## Alternatives
+Not captured yet.
+
+## Consequences
+- Future prompts should treat this as project context unless the user changes direction.
@@ -0,0 +1,25 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰를 하고 사용자 입장에서 개선이 필요한 부분이 있는지 확...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+## 경로 확인 결과 제공된 로컬 프로젝트 경로에는 접근할 수 있고, 코드 파일도 일부 확인되었습니다. 만약 추가적인 코드 확인이 필요하다면 <read_file> 이나 <list_files> 액션 태그를 즉시 사용하여 스스로 파일을 읽어보고 분석을 진행하겠습니다. 이전 응답에서 "파일을 제공해주세요" 라거나 "먼저 분석할까요?" 라고 묻는 것은 잘못된 안내입니다. 액션 태그를 통해 스스로 필요한 코드를 열어보겠습니다. ## 핵심 요약 - **현 상태 평가**: 현재 ConnectAI는 강력한 기능적 기반을 갖추고 있지만, 사용자에게 '현재 무엇이 일어나고 있는지'를 직관적으로 보여주는 시각적 피드백과 명시적인 작업 흐름 가시성이 가장 큰 개선점입니다. - **가장 중요한 UX 개선**: 복잡한 내부 로직(예: 검색, 계획 수립)을 수행할 때 사용자에게 '시스템이 지금 무엇을 하고 있는지'를 단계별로 보여주는 UI/UX 메커니즘을 추가해야 합니다. - **기술적 위험 요소**: 현재 ...
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- src/extension.ts
+- src/utils.ts
+- src/features/company/types.ts
+- media/sidebar.html
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,22 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰를 하고 사용자 입장에서 개선이 필요한 부분이 있는지 확...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+/Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰를 하고 사용자 입장에서 개선이 필요한 부분이 있는지 확인해줘.
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- No explicit changed file list was captured automatically.
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,22 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰를 해줘 그리고 어떻게 하면 더 유저 친화적으로 너를 개...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+/Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰를 해줘 그리고 어떻게 하면 더 유저 친화적으로 너를 개선할 수 있는지 의견 주면 좋겠어.
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- No explicit changed file list was captured automatically.
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,26 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+핵심 요약 - 현재 프로젝트 구조를 기반으로 코드를 직접 분석하기보다는, 아키텍처 문서와 주요 파일 목록을 통해 사용자 경험(UX) 및 기능적 개선점을 제안하는 것이 효과적입니다. - 가장 먼저 검토해야 할 부분은 `src/extension.ts`의 핵심 로직 흐름 파악과, 사용자가 체감할 수 있는 인터페이스(UI/UX)의 간소화입니다. - 장기적으로는 '지식 관리 정책'을 강화하여 지식 검색의 신뢰도를 높이고, 사용자 피드백 루프를 공식적인 기능으로 통합하는 것이 중요합니다. 상세 설명 요청하신 대로 `/Volumes/Data/project/Antigravity/ConnectAI` 프로젝트 전체에 대한 코드 리뷰 및 개선 제안을 진행했습니다. 다만, 이 요청은 동일한 내용이 세 번 반복되었으므로, 한 번의 포괄적인 분석으로 답변드리겠습니다. 현재 제공된 파일 목록과 아키텍처 컨텍스트를 바탕으로 볼 때, 코드를 직접 읽어보지 않고도 사용자 관점에서 개선할 수 있는 영역을 중심으로 평...
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- src/extension.ts
+- src/utils.ts
+- architecture.md
+- media/sidebar.html
+- src/retrieval/lessonHelpers.ts
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,22 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+/Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안을 줄 것이 잇는지 평가해줘.
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- No explicit changed file list was captured automatically.
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,23 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+현재 프로젝트의 가장 큰 개선점은 복잡한 내부 아키텍처를 사용자에게는 '마법'처럼 느껴지도록 추상화하고, 사용자의 의도를 파악하는 과정을 더욱 직관적인 경험으로 다듬는 것입니다. 프로젝트 구조와 목표(ConnectAI)를 고려했을 때, 기술적 안정성보다 사용자 여정(User Journey)과 신뢰도 확보에 초점을 맞춰 개선 방향을 제시합니다. 개선할 핵심 영역은 다음과 같습니다. 1. 의도 파악 과정의 시각화 및 간소화: 사용자가 무엇을 원하는지 AI가 추론하는 과정을 사용자에게 투명하게 보여주되, 그 복잡성을 느끼게 해서는 안 됩니다. 2. 정보 출처의 명확한 제시: 모든 답변에 대해 정보의 생성 시점과 맥락(Second Brain)을 함께 제시하여 신뢰도를 극대화해야 합니다. 3. 피드백 루프의 공식 기능 통합: 사용자가 AI의 결과물이나 아키텍처를 보고 '수정 제안' 버튼을 누르면, 해당 로직이 즉시 `src/utils.ts`와 같은 핵심 파일에 반영되는 명시적인 메커니즘을 구...
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- src/utils.ts
+- src/extension.ts
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,22 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+현재 프로젝트의 가장 시급한 개선점은 내부 아키텍처를 사용자에게는 '마법'처럼 보이도록 추상화하고, 사용자의 의도를 파악하는 과정을 더욱 직관적인 경험으로 다듬는 것입니다. 이전 논의에서 제시된 UX/기술적 개선 방향(의도 파악 시각화, 출처 명시 강화 등)은 모두 유효하며, 이들을 실제로 구현하기 위해서는 핵심 실행 로직을 정확히 이해하는 것이 선행되어야 합니다. 따라서 다음 단계로 `src/extension.ts` 파일을 분석하여 현재 시스템이 어떤 순서와 방식으로 작동하는지 파악하겠습니다. <read_file path="src/extension.ts"/>
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- src/extension.ts
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -0,0 +1,22 @@
+# Development Log: /Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안...
+
+## Purpose
+Record the implementation or verification outcome from the current conversation.
+
+## Implementation Summary
+/Volumes/Data/project/Antigravity/ConnectAI 코드 리뷰하고 사용자 입장에서 개선이 필요한 부분이 있는지, 제안을 줄 것이 잇는지 평가해줘.
+
+## Architecture
+Captured automatically from the assistant response and active project context.
+
+## Changed Files
+- No explicit changed file list was captured automatically.
+
+## Dependency Notes
+No new dependency note was captured automatically.
+
+## Bugs
+No bugs recorded.
+
+## Lessons
+- Automatic project records should be generated in the background when the turn contains durable project knowledge.
@@ -180,3 +180,30 @@

 ## 2026-05-22
 - Auto decision record created: decisions\ADR-0023-메일-다듬어줘-안녕하세요-문의-주신-poc-진행-관련하여-아래와-같이-회신드립니다-현재-내부-검토-결과-락인.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰를-해줘-그리고-어떻게_implementation.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰를-하고-사용자-입장에_implementation.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰를-하고-사용자-입장에_implementation-2.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰하고-사용자-입장에서-_implementation.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰하고-사용자-입장에서-_implementation-2.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰하고-사용자-입장에서-_implementation-3.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰하고-사용자-입장에서-_implementation-4.md
+
+## 2026-05-23
+- Auto development record created: development/2026-05-23_volumes-data-project-antigravity-connectai-코드-리뷰하고-사용자-입장에서-_implementation-5.md
+
+## 2026-05-23
+- Auto decision record created: decisions/ADR-0024-volumes-data-project-antigravity-connectai-코드-리뷰하고-사용자-입장에서-.md
@@ -2,6 +2,7 @@
 <html lang="ko">
 <head>
    <meta charset="UTF-8">
+    <meta http-equiv="Content-Security-Policy" content="__CSP__">
    <meta name="viewport" content="width=device-width,initial-scale=1.0">
    <title>Astra</title>
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
@@ -40,7 +41,15 @@
                <button class="icon-btn" id="companyManageBtn" data-tooltip="기업 모드 관리 (에이전트 · 모델 · 프롬프트 · 지식 비중)">▾</button>
                <div class="hdr-dropdown" data-dd>
                    <button class="icon-btn" id="toolsMenuBtn" data-dd-trigger data-tooltip="개발자 도구 모음">도구 ▾</button>
-                    <div class="hdr-menu" id="toolsMenu" data-dd-menu>
+                    <div class="hdr-menu hdr-menu-wide" id="toolsMenu" data-dd-menu>
+                        <div class="hdr-menu-label">자동 기록</div>
+                        <button class="hdr-menu-item toggle-item" id="chronicleAutoRecordBtn" data-tooltip="의미 있는 대화 turn 을 활성 프로젝트의 Chronicle 폴더에 자동 저장">자동 기록: 켜짐</button>
+                        <div class="hdr-menu-hint" id="chronicleAutoStatus" title="가장 최근에 자동 저장된 기록"><span class="status-dot ready"></span> <span id="recordsLatest"></span></div>
+                        <div class="select-wrap"><select id="chronicleRecordSel" title="열어볼 작업 기록 선택"></select></div>
+                        <button class="hdr-menu-item" id="openChronicleRecordBtn" data-tooltip="선택한 기록 열기">선택한 기록 열기</button>
+                        <button class="hdr-menu-item" id="refreshChronicleRecordsBtn" data-tooltip="기록 목록 다시 불러오기">기록 새로고침</button>
+                        <button class="hdr-menu-item" id="openDesignerBtn" data-tooltip="기록이 저장된 폴더 열기">기록 폴더 열기</button>
+
                        <div class="hdr-menu-label">도구</div>
                        <button class="hdr-menu-item toggle-item" id="brainTraceDebugBtn" data-tooltip="근거 추적의 원본 JSON 표시 (개발자용)">근거 추적 JSON 보기</button>
                        <button class="hdr-menu-item" id="saveWikiRawBtn" data-tooltip="현재 답변의 원본 마크다운을 두뇌(지식)에 저장">원본 답변을 두뇌에 저장</button>
@@ -123,25 +132,9 @@
        </div>
    </div>

-    <div class="records-line">
-        <div class="rl-summary">
-            <span class="status-dot ready"></span>
-            <span id="chronicleAutoStatus" title="의미 있는 대화 후 프로젝트 기록이 자동으로 저장됩니다.">자동 기록</span>
-            <span class="rl-latest" id="recordsLatest"></span>
-        </div>
-        <!-- (Removed) Corp chip moved to the header toolbar above —
-             see #companyChip / #companyManageBtn alongside New/Trace/Web. -->
-        <div class="hdr-dropdown" data-dd>
-            <button class="icon-btn" id="recordsMenuBtn" data-dd-trigger data-tooltip="저장된 작업 기록 열기">기록 ▾</button>
-            <div class="hdr-menu hdr-menu-wide" id="recordsMenu" data-dd-menu>
-                <div class="hdr-menu-label">작업 기록</div>
-                <div class="select-wrap"><select id="chronicleRecordSel" title="열어볼 작업 기록 선택"></select></div>
-                <button class="hdr-menu-item" id="openChronicleRecordBtn" data-tooltip="선택한 기록 열기">선택한 기록 열기</button>
-                <button class="hdr-menu-item" id="refreshChronicleRecordsBtn" data-tooltip="기록 목록 다시 불러오기">기록 새로고침</button>
-                <button class="hdr-menu-item" id="openDesignerBtn" data-tooltip="기록이 저장된 폴더 열기">기록 폴더 열기</button>
-            </div>
-        </div>
-    </div>
+    <!-- v2.2.71 — records-line 전체를 도구 ▾ 드롭다운 안으로 이동. 사이드바 본체엔 더 이상 자동 기록
+         라벨/selector/기록 ▾ 가 노출되지 않는다. 모든 자동 기록 UI 는 도구 ▾ 메뉴 첫 섹션 (자동 기록) 에서 접근. -->
+

    <!--
        Company manage overlay. Uses the same overlay framework as the agent
@@ -359,10 +359,31 @@
                try { _renderWelcome(); } catch {}
            }
        });
+        // v2.2.70 — 자동 기록 on/off 토글 상태. 도구 ▾ 메뉴의 토글 항목과 records-line 라벨
+        // (자동 기록 / 자동 기록 (꺼짐)) 모두 이 변수에서 동기화.
+        let chronicleAutoEnabled = true;
+        function renderChronicleAutoToggle() {
+            const btn = document.getElementById('chronicleAutoRecordBtn');
+            if (btn) {
+                btn.textContent = '자동 기록: ' + (chronicleAutoEnabled ? '켜짐' : '꺼짐');
+                btn.classList.toggle('active', chronicleAutoEnabled);
+            }
+            // v2.2.71 — chronicleAutoStatus 는 이제 도구 메뉴 안에서 "최근 저장 기록" 표시 컨테이너.
+            // textContent 를 직접 쓰면 자식(status-dot · recordsLatest)이 지워지므로 opacity / title 만 갱신.
+            const statusEl = document.getElementById('chronicleAutoStatus');
+            if (statusEl) {
+                statusEl.style.opacity = chronicleAutoEnabled ? '' : '0.55';
+                statusEl.title = chronicleAutoEnabled
+                    ? '자동 기록 켜짐 — 의미 있는 대화가 자동 저장됨'
+                    : '자동 기록 꺼짐 — 자동 저장 안 됨 (수동 기록은 가능)';
+            }
+        }
        function syncRecordsLine() {
            if (!recordsLatest) return;
            const opt = chronicleRecordSel && chronicleRecordSel.value ? selText(chronicleRecordSel) : '';
            recordsLatest.textContent = opt ? '· ' + truncMid(opt, 38) : '';
+            // OFF 면 최근 기록 라벨도 dim 처리해서 "지금은 저장 안 됨" 이 한눈에 보이게.
+            recordsLatest.style.opacity = chronicleAutoEnabled ? '' : '0.55';
        }

        // ── Ready-status bar (Engine / Model / Brain count / Context / Memory) ──
@@ -403,6 +424,8 @@
        }

        // ── Context-budget badge (직전 요청 기준) ────────────────────────────
+        // Last LM Studio prediction stats — merged into the badge after the turn finishes.
+        let lastLmStats = null;
        function renderCtxBadge(b) {
            if (!ctxBadge) return;
            if (!b || typeof b.inputTokens !== 'number') { ctxBadge.textContent = ''; ctxBadge.className = 'ctx-badge'; ctxBadge.title = ''; return; }
@@ -420,8 +443,35 @@
            const warn = b.tight || b.systemTruncated;
            ctxBadge.textContent = parts.join(' · ');
            ctxBadge.className = 'ctx-badge' + (warn ? ' warn' : ' ok');
+            // New turn starts → drop stale stats from the previous answer.
+            lastLmStats = null;
            ctxBadge.title = `model: ${b.model || ''}${b.paramB != null ? ' (~' + b.paramB + 'B)' : ''}\n입력 ≈ ${b.inputTokens} tokens (시스템 ${b.systemTokens}, 기록 ${b.historyKept}개)\n출력 상한 ${b.maxOutputTokens} tokens / 유효 context window ${b.contextLength} tokens${b.cappedForSmallModel ? ' (작은 모델용 축소; 설정값 ' + b.nominalContextLength + ')' : ''}`;
        }
+        function renderLmStudioStats(s) {
+            if (!ctxBadge || !s) return;
+            lastLmStats = s;
+            const extra = [];
+            if (typeof s.tokensPerSecond === 'number') extra.push(`${s.tokensPerSecond.toFixed(1)} tok/s`);
+            if (typeof s.timeToFirstTokenSec === 'number') extra.push(`TTFT ${s.timeToFirstTokenSec.toFixed(2)}s`);
+            // Speculative-decoding hit rate — shows whether the draft model is paying for itself.
+            // A healthy ratio is ~60%+; below ~30% means the draft is mis-predicting and slowing things down.
+            if (typeof s.draftTokensCount === 'number' && s.draftTokensCount > 0 && typeof s.acceptedDraftTokensCount === 'number') {
+                const pct = Math.round((s.acceptedDraftTokensCount / s.draftTokensCount) * 100);
+                extra.push(`spec ${pct}%`);
+            }
+            if (extra.length === 0) return;
+            // Append to badge without clobbering — only if current badge text doesn't already include it.
+            const current = ctxBadge.textContent || '';
+            const tail = ' · ' + extra.join(' · ');
+            if (!current.endsWith(tail)) ctxBadge.textContent = current + tail;
+            if (ctxBadge.title) {
+                ctxBadge.title += `\n\n[LM Studio]\n${extra.join(' · ')}`;
+                if (typeof s.predictedTokensCount === 'number') ctxBadge.title += `\n출력 ${s.predictedTokensCount} tokens`;
+                if (typeof s.totalTimeSec === 'number') ctxBadge.title += ` / 총 ${s.totalTimeSec.toFixed(2)}s`;
+                if (s.draftModelKey) ctxBadge.title += `\ndraft: ${s.draftModelKey} (${s.acceptedDraftTokensCount || 0}/${s.draftTokensCount || 0} accepted)`;
+                if (s.stopReason) ctxBadge.title += `\nstop: ${s.stopReason}`;
+            }
+        }
        if (readyBar) {
            readyBar.addEventListener('click', e => {
                const t = e.target;
@@ -865,18 +915,39 @@
                    try { _renderWelcome(); } catch {}
                    break;
                }
-                case 'brainProfiles':
+                case 'brainProfiles': {
+                    // 방어: profiles 가 비어/null 로 오면 기존 옵션 보존. 잘못된 상태로 dropdown 을 비워
+                    // "+ Add New Brain..." 만 남기는 회귀를 막는다 (v2.2.66 fix).
+                    const profilesArr = (msg.value && Array.isArray(msg.value.profiles)) ? msg.value.profiles : [];
+                    const activeId = msg.value && msg.value.activeBrainId;
+                    if (profilesArr.length === 0) {
+                        console.warn('[Astra] brainProfiles message had empty profiles list — preserving existing dropdown.');
+                        break;
+                    }
                    brainSel.innerHTML = '';
-                    msg.value.profiles.forEach(p => {
-                        const o = document.createElement('option'); o.value = p.id; o.innerText = p.name;
-                        if (p.id === msg.value.activeBrainId) o.selected = true;
+                    profilesArr.forEach(p => {
+                        const o = document.createElement('option');
+                        o.value = p.id;
+                        o.innerText = p.name;
                        brainSel.appendChild(o);
                    });
                    const addOpt = document.createElement('option');
-                    addOpt.value = 'new'; addOpt.innerText = '+ Add New Brain...';
+                    addOpt.value = 'new';
+                    addOpt.innerText = '+ Add New Brain...';
                    brainSel.appendChild(addOpt);
+                    // v2.2.67 — 옵션을 모두 넣은 *후* selection 을 적용한다. appendChild 이전에 `o.selected = true`
+                    // 를 거는 방식은 일부 Chromium webview 에서 무시되어 dropdown 이 마지막 옵션('+ Add New Brain...')
+                    // 에 머무는 회귀가 발생했다. brainSel.value 로 한 번에 잡으면 이전에 'new' 로 굳어있던 값도 확실히 덮어쓴다.
+                    if (activeId && profilesArr.some(p => p.id === activeId)) {
+                        brainSel.value = activeId;
+                    } else {
+                        brainSel.value = profilesArr[0].id;
+                    }
+                    // 다음에 사용자가 '+ Add New Brain...' 을 클릭하고 취소했을 때 복원할 "이전 유효 선택" 기억.
+                    brainSel.dataset.lastSelected = brainSel.value;
                    syncContextBar();
                    break;
+                }
                case 'sessionList':
                    historyList.innerHTML = '';
                    msg.value.forEach(s => {
@@ -912,6 +983,9 @@
                case 'contextBudget':
                    renderCtxBadge(msg.value);
                    break;
+                case 'lmStudioStats':
+                    renderLmStudioStats(msg.value);
+                    break;
                case 'usedScope': {
                    let target = streamBody && streamBody._parent;
                    if (!target) {
@@ -924,6 +998,13 @@
                case 'lessonCandidate':
                    renderLessonCandidate(msg.value || {});
                    break;
+                case 'chronicleAutoRecordStatus': {
+                    // v2.2.70 — 자동 기록 on/off 상태 푸시. 도구 메뉴 토글과 records-line 라벨 갱신.
+                    chronicleAutoEnabled = !!(msg.value && msg.value.enabled);
+                    renderChronicleAutoToggle();
+                    syncRecordsLine();
+                    break;
+                }
                case 'autoContinue':
                    statusLabel.innerText = msg.value; thinkingBar.classList.add('active');
                    if (msg.value.includes('Analyzing')) setStep('analyze');
@@ -931,6 +1012,23 @@
                    if (msg.value.includes('Executing')) setStep('execute');
                    setTimeout(() => { thinkingBar.classList.remove('active'); }, 3000);
                    break;
+                case 'workflowStage': {
+                    // [5-stage pipeline] 채팅 본문에 단계 메시지를 흘리는 대신, 사이드바 상단의
+                    // 얇은 status strip 에만 한 줄로 표시한다. done=true 면 strip 을 닫는다.
+                    const v = msg.value || {};
+                    const step = String(v.step || '');
+                    const text = String(v.message || '');
+                    const done = !!v.done;
+                    if (done || (!step && !text)) {
+                        statusLabel.innerText = '';
+                        thinkingBar.classList.remove('active');
+                    } else {
+                        const compact = step && text ? step + ' · ' + text : (step || text);
+                        statusLabel.innerText = compact;
+                        thinkingBar.classList.add('active');
+                    }
+                    break;
+                }
                case 'agentsList':
                    agentSel.innerHTML = '<option value="none">No Agent</option>';
                    msg.value.forEach(a => {
@@ -1655,18 +1753,44 @@
            btn.setAttribute('data-tooltip', secondBrainTraceDebug ? 'Second Brain Debug JSON: On' : 'Second Brain Debug JSON: Off');
            saveUiState();
        };
+        // v2.2.70 — 자동 기록 토글. 클릭 시 즉시 서버에 새 상태 전송. 서버가 config 저장 후
+        // chronicleAutoRecordStatus 메시지로 다시 푸시 → renderChronicleAutoToggle 가 UI 동기화.
+        const _chronicleAutoBtn = document.getElementById('chronicleAutoRecordBtn');
+        if (_chronicleAutoBtn) {
+            _chronicleAutoBtn.onclick = () => {
+                const next = !chronicleAutoEnabled;
+                // 낙관적 갱신 — 서버 응답 전에도 즉시 라벨이 바뀌어 클릭 반응성이 좋다. 응답이 오면 다시 정정.
+                chronicleAutoEnabled = next;
+                renderChronicleAutoToggle();
+                syncRecordsLine();
+                vscode.postMessage({ type: 'setChronicleAutoRecord', enabled: next });
+            };
+        }

        const syncBrain = () => { Sound.play(550, 'sine', 0.1); vscode.postMessage({ type: 'syncBrain' }); };
        document.getElementById('brainBtn').onclick = syncBrain;
        saveWikiRawBtn.onclick = () => vscode.postMessage({ type: 'saveWikiRaw' });
        addBrainBtn.onclick = () => vscode.postMessage({ type: 'addBrain' });
+        // v2.2.67 — 만약 dropdown 이 'new' 상태로 굳어있어도 수정/삭제는 직전 유효 선택(또는 첫 실제 옵션)
+        // 을 기준으로 동작하도록 폴백. 이전엔 'new' 인 순간 그냥 early-return 해서 "수정 안 됨" 버그 발생.
+        function _resolveActiveBrainId() {
+            if (brainSel.value && brainSel.value !== 'new') return brainSel.value;
+            const last = brainSel.dataset.lastSelected;
+            if (last && last !== 'new') return last;
+            for (const opt of brainSel.options) {
+                if (opt.value && opt.value !== 'new') return opt.value;
+            }
+            return '';
+        }
        editBrainBtn.onclick = () => {
-            if (!brainSel.value || brainSel.value === 'new') return;
-            vscode.postMessage({ type: 'editBrain', id: brainSel.value });
+            const id = _resolveActiveBrainId();
+            if (!id) return;
+            vscode.postMessage({ type: 'editBrain', id });
        };
        deleteBrainBtn.onclick = () => {
-            if (!brainSel.value || brainSel.value === 'new') return;
-            vscode.postMessage({ type: 'deleteBrain', id: brainSel.value });
+            const id = _resolveActiveBrainId();
+            if (!id) return;
+            vscode.postMessage({ type: 'deleteBrain', id });
        };
        // (inputSyncBtn removed — Sync Knowledge is reachable via the top brainBtn / Tools menu.)
        document.getElementById('historyBtn').onclick = () => vscode.postMessage({ type: 'getSessions' });
@@ -1702,8 +1826,18 @@
        }
        brainSel.onchange = () => {
            if (brainSel.value === 'new') {
+                // v2.2.67 — '+ Add New Brain...' 클릭 시 addBrain 메시지를 보내고, 사용자가 폴더 선택을
+                // 취소해도 dropdown 이 'new' 로 굳지 않도록 즉시 직전 유효 선택으로 되돌린다. 추가가 실제로
+                // 성공하면 _postBrainProfiles → brainProfiles 메시지가 새 brain 으로 다시 옮긴다.
+                // 이 복원이 없으면 brainSel.value === 'new' 상태가 유지되어 수정/삭제 버튼이 early-return 으로 죽는다.
+                const prev = brainSel.dataset.lastSelected;
                vscode.postMessage({ type: 'addBrain' });
+                if (prev && prev !== 'new') {
+                    brainSel.value = prev;
+                    syncContextBar();
+                }
            } else {
+                brainSel.dataset.lastSelected = brainSel.value;
                vscode.postMessage({ type: 'setBrainProfile', id: brainSel.value });
            }
        };
@@ -1868,6 +2002,7 @@
        vscode.postMessage({ type: 'getAgents' });
        vscode.postMessage({ type: 'getChronicleProjects' });
        vscode.postMessage({ type: 'getChronicleRecords' });
+        vscode.postMessage({ type: 'getChronicleAutoRecord' });
        vscode.postMessage({ type: 'getKnowledgeMix' });
        vscode.postMessage({ type: 'getArchitectureStatus' });
        vscode.postMessage({ type: 'getCompanyStatus' });
@@ -2,7 +2,7 @@
  "name": "astra",
  "displayName": "Astra",
  "description": "The personal intelligence layer for Antigravity and VS Code. A private cognitive partner for deep project context, memory, and proactive strategic decision-making.",
-  "version": "2.2.64",
+  "version": "2.2.73",
  "publisher": "g1nation",
  "license": "MIT",
  "icon": "assets/icon.png",
@@ -342,6 +342,74 @@
          "default": true,
          "description": "Automatically load LM Studio models into memory when selected from the Astra sidebar."
        },
+        "g1nation.lmStudio.sampling.topP": {
+          "type": "number",
+          "default": 0.9,
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Nucleus sampling cutoff. Small / quantized models often spew wrong-neighbour tokens (한글 깨짐: 붕괴→붕점) when the tail is wide. Lower (0.8–0.9) tightens; 1.0 disables. Applied to both SDK and REST paths."
+        },
+        "g1nation.lmStudio.sampling.topK": {
+          "type": "number",
+          "default": 20,
+          "minimum": 0,
+          "description": "Top-K sampling cutoff. 0 disables. Default 20 — tighter for small models, raise to 40–80 for large models that already sample well."
+        },
+        "g1nation.lmStudio.sampling.minP": {
+          "type": "number",
+          "default": 0.05,
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Min-P floor — discards tokens with probability below this fraction of the top token. Good defence against rare-token glitches. 0 disables."
+        },
+        "g1nation.lmStudio.sampling.repeatPenalty": {
+          "type": "number",
+          "default": 1.1,
+          "minimum": 1,
+          "maximum": 2,
+          "description": "Repeat / frequency penalty to curb stutter (것입니다서입니다…). 1.0 disables. Values 1.05–1.2 are typical."
+        },
+        "g1nation.lmStudio.statsInBudget": {
+          "type": "boolean",
+          "default": true,
+          "description": "Show token/s and time-to-first-token from LM Studio prediction stats in the context-budget badge after each turn (SDK path only)."
+        },
+        "g1nation.lmStudio.draftModel": {
+          "type": "string",
+          "default": "",
+          "description": "[Speculative decoding] LM Studio model key of a small draft model (e.g. 'gemma-2b-it') used to accelerate the main model. Empty disables. 1.5–3x throughput on large models. The draft must be downloaded in LM Studio (load is automatic on first use)."
+        },
+        "g1nation.lmStudio.load.flashAttention": {
+          "type": "boolean",
+          "default": true,
+          "description": "[Load option] Enable Flash Attention when loading models. Faster generation + lower memory on compatible hardware, especially helpful for long contexts. Default: true."
+        },
+        "g1nation.lmStudio.load.gpuOffloadRatio": {
+          "type": "string",
+          "default": "max",
+          "description": "[Load option] How much of the model to offload to GPU. 'max' = all (default), 'off' = CPU only, or a number 0–1 (e.g. '0.5' = half). Numeric strings are parsed."
+        },
+        "g1nation.lmStudio.load.offloadKVCacheToGpu": {
+          "type": "boolean",
+          "default": true,
+          "description": "[Load option] Keep KV cache on GPU memory. Faster but requires VRAM headroom. Default: true."
+        },
+        "g1nation.lmStudio.load.keepModelInMemory": {
+          "type": "boolean",
+          "default": true,
+          "description": "[Load option] Prevent the model from being swapped out of system memory. Improves interactive responsiveness; raises RAM use. Default: true."
+        },
+        "g1nation.lmStudio.load.useFp16ForKVCache": {
+          "type": "boolean",
+          "default": false,
+          "description": "[Load option] Store KV cache in FP16 (halves cache memory). Tiny quality impact for most models — try if you run out of VRAM at long contexts. Default: false."
+        },
+        "g1nation.lmStudio.load.evalBatchSize": {
+          "type": "number",
+          "default": 0,
+          "minimum": 0,
+          "description": "[Load option] Token batch size during evaluation. 0 = engine default. Higher (512–1024) improves prefill speed on GPU at the cost of memory."
+        },
        "g1nation.localBrainPath": {
          "type": "string",
          "default": "",
@@ -484,6 +552,40 @@
          "default": true,
          "description": "Persist substantive Reflector critiques to the active brain as lesson cards under `lessons/auto-reflector/`. Future missions automatically retrieve these cards (via the existing Experience-Memory pipeline) and inject them as ‘[⚠ ACTIVE LESSONS — verify these BEFORE finalizing]’ guardrails into Planner/Researcher/Writer context. A repeated critique (similar title) bumps `occurrences` and escalates `severity` (low→medium→high) instead of duplicating the card, so recurring patterns get louder over time. Disable to keep critiques single-mission only."
        },
+        "g1nation.workflow.synthesizerEnabled": {
+          "type": "boolean",
+          "default": true,
+          "markdownDescription": "5단계 파이프라인의 마지막 단계로 **Synthesizer**(최종 다듬기) 패스를 한 번 더 돌릴지 여부. true(기본): Drafter가 만든 1차 초안을 Synthesizer가 받아 도입 한 줄·섹션 흐름·결론을 정리해 사용자용 최종 답변으로 만든다. 입력이 작은 draft 뿐이라 컨텍스트가 가벼워 작은 로컬 모델(≤4B)도 부담 없이 처리한다. false: Drafter 출력이 그대로 최종 답변이 된다(기존 4단계 동작)."
+        },
+        "g1nation.workflow.multiAgentMode": {
+          "type": "string",
+          "enum": ["auto", "always", "off"],
+          "default": "auto",
+          "markdownDescription": "Multi-Agent(5단계) 파이프라인 발동 모드.\n\n- `auto` (기본): 작은 모델(≤4B) 감지, 큰 prompt(컨텍스트의 30%+), 명시적 키워드(보고서/리뷰/심층 분석…), 또는 사용자가 `multiAgentEnabled`를 켰을 때 자동으로 발동. 짧은 인사·잡담은 제외.\n- `always`: 인사·잡담을 제외한 모든 요청에 5단계 파이프라인 사용. 작은 모델로도 답변이 한 번에 끝나지 않는다면 이 모드가 안정적.\n- `off`: 기존 키워드/길이 휴리스틱 + 수동 `multiAgentEnabled` 토글만 사용 (legacy 동작)."
+        },
+        "g1nation.workflow.autoCtxFractionThreshold": {
+          "type": "number",
+          "default": 0.30,
+          "minimum": 0.05,
+          "maximum": 0.95,
+          "markdownDescription": "`workflow.multiAgentMode = auto` 일 때, prompt 토큰이 효과적 context window 의 이 비율(0~1)을 넘으면 5단계 파이프라인을 강제 발동. 기본 0.30 — 작은 모델이 input으로 컨텍스트의 30% 이상을 먹기 시작하면 한 번에 답하려다 EOS/잘림이 잘 발생한다."
+        },
+        "g1nation.liveStreamTokens": {
+          "type": "boolean",
+          "default": true,
+          "markdownDescription": "모델 토큰을 받는 즉시 채팅 버블에 흘려보낼지 여부.\n\n- `true` (기본): 토큰을 받는 즉시 표시 → TTFT 체감 속도 향상. 생성이 끝나면 sanitize 된 최종 답변으로 `streamReplace` 가 한 번에 교체하므로 control token 노출은 잠깐만 가능.\n- `false`: 토큰을 내부에서만 누적, sanitize(`<|channel|>thought` / `<think>` / `Thinking Process:` 등 제거) 끝난 **최종 답변만 한 번에** 표시. 모델의 control token 이 잠깐이라도 화면에 노출되는 누설을 원천 차단."
+        },
+        "g1nation.outputFormat": {
+          "type": "string",
+          "enum": ["plain", "markdown"],
+          "default": "plain",
+          "markdownDescription": "최종 답변 표시 방식.\n\n- `plain` (기본): 모델이 무심코 내보낸 마크다운 마커(`##`, `**`, `__`, `> `, `* ` 등)를 후처리로 모두 제거. 섹션 라벨 텍스트(예: `핵심 요약`)는 유지되지만 헤더 마커는 사라져 깔끔한 plain text 로 보임. 작은 로컬 모델이 학습된 습관으로 `## 다음 한 수` 같은 마커를 흘리는 문제 차단.\n- `markdown`: legacy 동작. 모델 출력을 그대로 렌더러에 넘김."
+        },
+        "g1nation.chronicleAutoRecord": {
+          "type": "boolean",
+          "default": true,
+          "markdownDescription": "자동 기록 (Project Chronicle Auto-Record).\n\n- `true` (기본): 매 chat turn 후 의미 있는 대화(planning / decision / bug / development / discussion 유형 자동 판별)를 활성 프로젝트의 Chronicle 폴더에 자동 저장.\n- `false`: 자동 저장 OFF. 수동 기록 (도구 ▾ 의 기록 항목, `/wiki` 등) 은 계속 가능.\n\n사이드바 **도구 ▾** 메뉴의 `자동 기록` 토글로 즉시 전환 가능 — 설정 패널까지 들어갈 필요 없음."
+        },
        "g1nation.company.intentClassifierModel": {
          "type": "string",
          "default": "",
@@ -53,6 +53,7 @@ import {
 } from './retrieval/knowledgeMix';
 import {
    extractVisibleFinal,
+    stripMarkdownFormatting,
    shouldFinalOnlyRetry,
    shouldAutoContinue,
    looksCutOff,
@@ -73,6 +74,7 @@ import {
    estimateModelParamsB,
    type ContextLimits,
 } from './lib/contextManager';
+import { samplingToRestBody, type LmStudioSampling, type ChatStreamStats } from './lmstudio/streamer';

 export interface ChatMessage {
    role: 'user' | 'assistant' | 'system';
@@ -208,6 +210,10 @@ export class AgentExecutor {
    private historyChangeListener: HistoryChangeListener | undefined;
    private runSerial = 0;
    private activeRunId = 0;
+    // v2.2.69 — 모드 전환 감지용. handlePrompt 진입 시 현재 mode signature 를 계산해
+    // 직전 값과 다르면 system prompt 에 "이전 대화에서 ... 모드 전환됨" 한 줄을 끼운다.
+    // mode signature 는 (agent skill, multiAgent, company mode, 활성 brain) 의 해시.
+    private _lastModeSignature: string | null = null;
    private transactionManager: TransactionManager;
    private sessionManager: SessionManager;
    private statusBarManager: StatusBarManager;
@@ -369,6 +375,9 @@ export class AgentExecutor {
            this.onSessionEnd();
        }
        this.chatHistory = [];
+        // v2.2.69 — 새 세션엔 "이전 모드" 가 없음. mode signature 초기화하지 않으면 첫 메시지에서
+        // 직전 세션의 mode 와 비교돼 잘못된 bridge 가 끼는 회귀가 생긴다.
+        this._lastModeSignature = null;
        this.emitHistoryChanged();
    }

@@ -387,6 +396,7 @@ export class AgentExecutor {
            this.onSessionEnd();
        }
        this.chatHistory = [];
+        this._lastModeSignature = null;
        this.emitHistoryChanged();
    }

@@ -633,6 +643,39 @@ export class AgentExecutor {
            // 제거하고 에이전트 프롬프트를 최후단에 배치하여 절대 우선 적용.
            // ──────────────────────────────────────────────────────────────────
            const isAgentMode = !!options.agentSkillContext;
+
+            // v2.2.69 — 모드 전환 bridge. 현재 mode signature 를 직전 값과 비교해 바뀌었으면
+            // "이전 대화는 X 모드에서 Y 주제로 진행됨 / 지금부터 Z 모드" 한 줄을 system prompt 에 끼운다.
+            // chatHistory 자체는 손대지 않으므로 사용자 입장에선 대화가 연속되어 보이면서도
+            // 모델은 "모드가 바뀐 직후" 임을 인지한다.
+            let modeBridgeCtx = '';
+            try {
+                const agentSkillName = options.agentSkillContext
+                    ? (options.agentSkillContext.split('\n')[0] || '').slice(0, 60).replace(/^#\s*/, '').trim()
+                    : '';
+                const currentSig = this.computeModeSignature({
+                    agentSkillName: agentSkillName || undefined,
+                    companyMode: !!(options as any).companyMode,
+                    multiAgent: !!(options as any).multiAgent,
+                    brainName: getActiveBrainProfile()?.name,
+                });
+                if (this._lastModeSignature !== null && this._lastModeSignature !== currentSig) {
+                    const topic = this.buildLastTopicLine();
+                    const bridgeLines = [
+                        '',
+                        '[MODE TRANSITION BRIDGE]',
+                        `이전 모드: ${this._lastModeSignature}`,
+                        `현재 모드: ${currentSig}`,
+                    ];
+                    if (topic) bridgeLines.push(`직전 대화 주제(한 줄): ${topic}`);
+                    bridgeLines.push('대화 history 는 그대로 이어진다. 새 모드의 페르소나/포맷을 따르되, 직전까지 사용자가 다루던 맥락을 잊지 말 것.');
+                    modeBridgeCtx = bridgeLines.join('\n');
+                }
+                this._lastModeSignature = currentSig;
+            } catch (e: any) {
+                logError('Mode-bridge computation failed (non-fatal).', { error: e?.message || String(e) });
+            }
+
            let fullSystemPrompt: string;

            if (isAgentMode) {
@@ -665,7 +708,7 @@ export class AgentExecutor {

                // [CONTEXT] … [/CONTEXT] 사이만 컨텍스트 초과 시 trim 대상 — agentBlock(앞)·reminder(뒤)·negative 는 보호.
                // memoryCtx(RAG/메모리/lessons)도 [CONTEXT] 안에 넣어 토큰이 빡빡할 때 대화 기록보다 먼저 잘리게 한다.
-                fullSystemPrompt = `${agentBlock}\n\n${strippedSystemPrompt}${designerCtx}${secondBrainTraceCtx}\n\n[CONTEXT]\n${memoryCtx}\n${knowledgeContextForPrompt}\n${contextBlock}\n[/CONTEXT]\n${negativeCtx}${agentTailReminder}`;
+                fullSystemPrompt = `${agentBlock}${modeBridgeCtx ? '\n\n' + modeBridgeCtx : ''}\n\n${strippedSystemPrompt}${designerCtx}${secondBrainTraceCtx}\n\n[CONTEXT]\n${memoryCtx}\n${knowledgeContextForPrompt}\n${contextBlock}\n[/CONTEXT]\n${negativeCtx}${agentTailReminder}`;
            } else {
                // 기존 Astra 모드 (에이전트 미선택)
                const localProjectKnowledgeCtx = prompt && localPathContext && this.isProjectKnowledgeCreationRequest(prompt)
@@ -700,7 +743,7 @@ export class AgentExecutor {
                    })()
                    : '';
                // memoryCtx(RAG/메모리/lessons)는 [CONTEXT] 안에 — 토큰이 빡빡하면 대화 기록보다 먼저 잘림.
-                fullSystemPrompt = `${systemPrompt}${designerCtx}${projectArchitectureCtx}${localProjectKnowledgeCtx}${thinkingPartnerCtx}${astraStanceCtx}${secondBrainTraceCtx}${v4PolicyCtx}${knowledgeMixCtx}${casualCtx}\n\n[CONTEXT]\n${memoryCtx}\n${knowledgeContextForPrompt}\n${contextBlock}\n[/CONTEXT]\n${negativeCtx}`;
+                fullSystemPrompt = `${systemPrompt}${modeBridgeCtx ? '\n\n' + modeBridgeCtx : ''}${designerCtx}${projectArchitectureCtx}${localProjectKnowledgeCtx}${thinkingPartnerCtx}${astraStanceCtx}${secondBrainTraceCtx}${v4PolicyCtx}${knowledgeMixCtx}${casualCtx}\n\n[CONTEXT]\n${memoryCtx}\n${knowledgeContextForPrompt}\n${contextBlock}\n[/CONTEXT]\n${negativeCtx}`;
            }
            // ──────────────────────────────────────────────────────────────────
            // [Context Limit Manager] context length 는 "답변을 그만큼 길게 써도 된다"
@@ -768,14 +811,17 @@ export class AgentExecutor {
            );
            let budgetedHistory: ChatMessage[] = reqMessages;
            if (config.autoCompactHistory) {
-                const trim = trimHistoryToBudget<ChatMessage>(reqMessages, historyBudget, (n) => ({
+                // v2.2.69 — dropped 메시지를 받아 heuristic 요약을 만든 뒤 한 system 메시지로 prepend.
+                // 단순 count 마커는 "이전에 무슨 얘기를 했는지" 를 전혀 알려주지 않아 후속 턴에서 모델이
+                // 맥락을 잃어버리는 회귀를 낳았다. 이제는 U1/A1/U2/A2 골자가 남아 sliding window 가 동작.
+                const trim = trimHistoryToBudget<ChatMessage>(reqMessages, historyBudget, (_n, dropped) => ({
                    role: 'system',
-                    content: `[이전 대화 ${n}개 메시지는 컨텍스트 한계 때문에 이번 요청에서 생략되었습니다. 필요하면 사용자에게 다시 확인하세요.]`,
+                    content: this.buildDroppedHistorySummary(dropped),
                    internal: true,
                }));
                budgetedHistory = trim.messages;
                if (trim.droppedCount > 0) {
-                    logInfo('Conversation history compacted to fit the context window.', {
+                    logInfo('Conversation history compacted to fit the context window (with summary).', {
                        model: actualModel, droppedCount: trim.droppedCount, historyBudget,
                    });
                }
@@ -864,8 +910,12 @@ export class AgentExecutor {
            // policy enforcement) emits a final `streamReplace` so the bubble
            // ends up matching the cleaned answer regardless of what slipped
            // through live.
-            const postLiveDeltas = loopDepth === 0;
+            // [Clean Stream] g1nation.liveStreamTokens=false (기본) 이면 토큰을 내부에만
+            // 누적하고 sanitize 끝난 최종 답변만 한 번에 표시 → Harmony/think 마커가 잠깐
+            // 화면에 노출되는 누설을 원천 차단한다. true 로 두면 legacy 라이브 스트리밍.
+            const postLiveDeltas = loopDepth === 0 && getConfig().liveStreamTokens === true;

+            let lmStudioStats: ChatStreamStats | undefined;
            if (useLmStudioSdk) {
                apiUrl = `${ollamaUrl} (sdk)`;
                logInfo('Streaming chat via LM Studio SDK.', { model: actualModel });
@@ -876,15 +926,35 @@ export class AgentExecutor {
                        temperature,
                        maxTokens: maxOutputTokens,
                        contextOverflowPolicy: config.contextOverflowPolicy,
+                        ...this.lmStudioSamplingFromConfig(),
+                        ...this.lmStudioRespondExtrasFromConfig(),
                        signal: this.abortController.signal,
                    });
-                    for await (const { token, stopReason } of stream) {
+                    for await (const { token, stopReason, stats } of stream) {
                        if (this.isStaleRun(runId)) return;
                        if (token) {
                            aiResponseText += token;
                            if (postLiveDeltas) this.webview.postMessage({ type: 'streamChunk', value: token });
                        }
                        if (stopReason) finishStopReason = stopReason;
+                        if (stats) lmStudioStats = stats;
+                    }
+                    if (lmStudioStats && getConfig().lmStudioShowStatsInBudget && loopDepth === 0) {
+                        this.webview.postMessage({
+                            type: 'lmStudioStats',
+                            value: {
+                                model: actualModel,
+                                tokensPerSecond: lmStudioStats.tokensPerSecond,
+                                timeToFirstTokenSec: lmStudioStats.timeToFirstTokenSec,
+                                predictedTokensCount: lmStudioStats.predictedTokensCount,
+                                promptTokensCount: lmStudioStats.promptTokensCount,
+                                totalTimeSec: lmStudioStats.totalTimeSec,
+                                draftModelKey: lmStudioStats.draftModelKey,
+                                draftTokensCount: lmStudioStats.draftTokensCount,
+                                acceptedDraftTokensCount: lmStudioStats.acceptedDraftTokensCount,
+                                stopReason: finishStopReason,
+                            },
+                        });
                    }
                } catch (err: any) {
                    if (err?.name === 'AbortError' || this.abortController.signal.aborted) {
@@ -1007,60 +1077,34 @@ export class AgentExecutor {
            //
            // Only attempts recovery on loopDepth === 0 — we don't want to
            // ping-pong inside the autonomous action loop.
+            //
+            // Note: the previous SDK handle-reset retry that lived here is now done
+            // inside `LMStudioStreamer.stream()` itself (it auto-recreates the SDK
+            // on attempt 2 for both dead-handle errors *and* clean-but-empty streams),
+            // so by the time we get here with `useLmStudioSdk` and no text, the SDK
+            // path has already tried twice. Go straight to the REST fallback.
            if (!aiResponseText.trim() && !this.abortController?.signal.aborted && loopDepth === 0) {
-                if (useLmStudioSdk && this.options.lmStudioStreamer?.resetHandle) {
-                    try {
-                        logInfo('Empty SDK stream — resetting LM Studio handle and retrying streaming once.', { model: actualModel });
-                        await this.options.lmStudioStreamer.resetHandle(actualModel);
-                        const retryStream = this.options.lmStudioStreamer.stream({
-                            modelName: actualModel,
-                            messages: messagesForRequest.map((m) => ({ role: m.role, content: m.content })),
-                            temperature,
-                            maxTokens: maxOutputTokens,
-                            contextOverflowPolicy: config.contextOverflowPolicy,
-                            signal: this.abortController.signal,
-                        });
-                        let retryText = '';
-                        for await (const { token, stopReason } of retryStream) {
-                            if (this.isStaleRun(runId)) return;
-                            if (token) {
-                                retryText += token;
-                                if (postLiveDeltas) this.webview.postMessage({ type: 'streamChunk', value: token });
-                            }
-                            if (stopReason) finishStopReason = stopReason;
-                        }
-                        if (retryText.trim()) {
-                            aiResponseText = retryText;
-                            logInfo('Handle-reset retry recovered the answer.', { model: actualModel, length: retryText.length });
-                        }
-                    } catch (retryErr: any) {
-                        logError('Handle-reset retry failed.', { model: actualModel, error: retryErr?.message ?? String(retryErr) });
-                    }
-                }
-
-                if (!aiResponseText.trim() && !this.abortController?.signal.aborted) {
-                    try {
-                        logInfo('Empty stream — trying non-streaming fallback.', { engine, model: actualModel, apiUrl });
-                        const fallback = await this.callNonStreaming({
-                            baseUrl: ollamaUrl,
-                            modelName: actualModel,
-                            engine,
-                            messages: messagesForRequest,
-                            temperature,
-                            maxTokens: maxOutputTokens,
-                            contextLength: ctxLimits.contextLength,
-                            signal: this.abortController?.signal,
-                        });
-                        if (fallback.stopReason) finishStopReason = fallback.stopReason;
-                        if (fallback.text && fallback.text.trim()) {
-                            aiResponseText = fallback.text;
-                            logInfo('Non-streaming fallback recovered the answer.', { engine, model: actualModel, length: fallback.text.length });
-                        }
-                    } catch (recoverErr: any) {
-                        logError('Non-streaming fallback also failed.', {
-                            engine, model: actualModel, error: recoverErr?.message ?? String(recoverErr),
-                        });
+                try {
+                    logInfo('Empty stream — trying non-streaming fallback.', { engine, model: actualModel, apiUrl });
+                    const fallback = await this.callNonStreaming({
+                        baseUrl: ollamaUrl,
+                        modelName: actualModel,
+                        engine,
+                        messages: messagesForRequest,
+                        temperature,
+                        maxTokens: maxOutputTokens,
+                        contextLength: ctxLimits.contextLength,
+                        signal: this.abortController?.signal,
+                    });
+                    if (fallback.stopReason) finishStopReason = fallback.stopReason;
+                    if (fallback.text && fallback.text.trim()) {
+                        aiResponseText = fallback.text;
+                        logInfo('Non-streaming fallback recovered the answer.', { engine, model: actualModel, length: fallback.text.length });
                    }
+                } catch (recoverErr: any) {
+                    logError('Non-streaming fallback also failed.', {
+                        engine, model: actualModel, error: recoverErr?.message ?? String(recoverErr),
+                    });
                }
            }

@@ -1183,7 +1227,12 @@ export class AgentExecutor {
                }
                if (this.isStaleRun(runId)) return;
            }
-            const cleanedVisible = cleaned.visible;
+            // [Plain Text Output] outputFormat='plain' (기본)이면 모델이 무심코 내보낸
+            // 마크다운 마커(`##`, `**`, `> `, `* ` …) 를 후처리로 모두 제거. 라벨 텍스트는 유지.
+            // markdown 모드면 legacy 그대로 통과.
+            const cleanedVisible = getConfig().outputFormat === 'plain'
+                ? stripMarkdownFormatting(cleaned.visible)
+                : cleaned.visible;

            // 5. Execute Actions
            const rationale = this.parseRationale(cleanedVisible);
@@ -1235,7 +1284,13 @@ export class AgentExecutor {
            if (notice && assistantContent.trim()) {
                assistantContent = assistantContent.trimEnd() + notice;
            }
-            const finalAssistantContent = assistantContent;
+            // [Plain Text Output — FINAL pass] enforcer 들이 `## 경로 확인 결과` 같은 하드코딩 헤더를
+            // 다시 prepend 한 후에도 마커가 남지 않도록, webview / chatHistory 에 들어가는 최종 문자열을
+            // 한 번 더 sanitize. cleanedVisible 단계의 1차 sanitize 는 model 출력 자체를 정리하고,
+            // 이 2차 sanitize 는 enforcer 출력까지 모두 청소한다.
+            const finalAssistantContent = getConfig().outputFormat === 'plain'
+                ? stripMarkdownFormatting(assistantContent)
+                : assistantContent;

            const assistantMessage: ChatMessage = { role: 'assistant', content: finalAssistantContent, internal: false, rationale };
            this.chatHistory.push(assistantMessage);
@@ -1470,21 +1525,33 @@ export class AgentExecutor {
                : '';

            // 워크플로우 매니저에게 설정 기반 실행 위임
-            const finalReport = await AgentWorkflowManager.runStrictWorkflow(
+            // [Clean Stream] 단계 진행 메시지는 채팅 본문(streamChunk) 이 아닌 사이드바
+            // 상단의 workflowStage 인디케이터로만 표시한다 → "생각 단계가 본문에 계속 보임"
+            // 답답함 제거. 채팅 버블에는 최종 답변만 한 번에 들어간다.
+            const rawFinalReport = await AgentWorkflowManager.runStrictWorkflow(
                prompt,
                modelName,
                `${brainContext}${selectedAgentContext}${designerContext}`,
                signal,
                (step, msg) => {
-                    this.webview?.postMessage({ type: 'autoContinue', value: `${step}: ${msg}` });
-                    // 각 단계별 시작을 알림
-                    this.webview?.postMessage({ type: 'streamChunk', value: `\n\n> **[${step}]** ${msg}\n\n` });
+                    this.webview?.postMessage({
+                        type: 'workflowStage',
+                        value: { step, message: msg, done: step === '완료' || step === '오류' }
+                    });
                }
            );

            if (signal.aborted || !this.webview) return;

-            this.webview.postMessage({ type: 'streamChunk', value: `\n\n--- \n\n${finalReport}` });
+            // [Plain Text Output] Synthesizer가 잘 따라줬어도 작은 모델은 `##` `**` 를 흘리는 경우가 있어
+            // 최종 후처리로 한 번 더 마커를 벗긴다. 채팅 history 에도 정제된 결과만 남겨 다음 턴 컨텍스트에서
+            // 마커가 재학습되는 일을 막는다.
+            const finalReport = getConfig().outputFormat === 'plain'
+                ? stripMarkdownFormatting(rawFinalReport)
+                : rawFinalReport;
+
+            this.webview.postMessage({ type: 'streamChunk', value: finalReport });
+            this.webview.postMessage({ type: 'workflowStage', value: { step: '완료', message: '', done: true } });
            this.webview.postMessage({ type: 'streamEnd' });

            this.chatHistory.push({ role: 'assistant', content: finalReport });
@@ -1494,6 +1561,8 @@ export class AgentExecutor {
            this.webview.postMessage({ type: 'autoContinue', value: '✅ 모든 분석이 성공적으로 완료되었습니다.' });

        } catch (error: any) {
+            // 어떤 종료 경로에서든 stage indicator 는 반드시 닫는다 — 안 닫으면 사이드바에 영원히 "③ 자기 검증..." 가 남는다.
+            this.webview?.postMessage({ type: 'workflowStage', value: { step: '완료', message: '', done: true } });
            if (error.name === 'AbortError' || error.message?.includes('cancelled')) {
                this.statusBarManager.updateStatus(AgentStatus.Idle, 'Workflow Cancelled');
                return;
@@ -1537,10 +1606,23 @@ export class AgentExecutor {
                    temperature: 0.3,
                    maxTokens: subMaxTokens,
                    contextOverflowPolicy,
+                    ...this.lmStudioSamplingFromConfig(),
+                    ...this.lmStudioRespondExtrasFromConfig(),
                    signal: this.abortController?.signal,
                });
-                for await (const { token } of stream) {
+                let subStopReason: string | undefined;
+                for await (const { token, stopReason } of stream) {
                    if (token) responseText += token;
+                    if (stopReason) subStopReason = stopReason;
+                }
+                // Sub-agent answers that got cut mid-sentence corrupt the pipeline silently
+                // (Planner produces a half-step, Writer can't recover). Surface a warn log so
+                // the operator can raise subMaxTokens or pick a less aggressive output budget.
+                if (subStopReason && /maxPredicted|context|truncat/i.test(subStopReason)) {
+                    logError('Sub-agent answer hit a generation limit.', {
+                        role, model: modelName, stopReason: subStopReason,
+                        chars: responseText.length, maxTokens: subMaxTokens,
+                    });
                }
                return responseText;
            } catch (err: any) {
@@ -1726,12 +1808,13 @@ export class AgentExecutor {
                return [
                    'Intent operating contract — Code Review:',
                    'The user wants a real review, not a meta-plan of how to review.',
-                    'Required sections in this exact order, in Korean:',
-                    '  1. ## 한 줄 판단 — one sentence: would you rely on this today, and under what constraint?',
-                    '  2. ## 잘된 점 — 2~4 concrete strengths. Each MUST cite a specific file path (and a function or section if you can name one) and explain WHY it works, not just that it exists.',
-                    '  3. ## 부족한 점 — 2~4 concrete weaknesses or risks. Same rule: cite a specific file/area, name the actual problem (race condition, missing retry, coupling, etc.), and say what breaks because of it.',
-                    '  4. ## 사용자 관점 개선 — 2~4 changes phrased from the END USER\'s perspective ("when X happens, the user currently sees Y; they should see Z"). Tie each to a code location that needs to change.',
-                    '  5. ## 다음 한 수 — exactly one next action, small enough to do this week.',
+                    'OUTPUT FORMAT: PLAIN TEXT only. Section labels are bare words on their own line (no "#", "##", "**", "__", "> "). Bullets use "- ". Long answers MUST start with a "핵심 요약" block (2~4 bullets) before any detail.',
+                    'Required sections in this exact order, in Korean (each label appears as a plain line, NOT a markdown heading):',
+                    '  1) 한 줄 판단 — one sentence: would you rely on this today, and under what constraint?',
+                    '  2) 잘된 점 — 2~4 concrete strengths. Each MUST cite a specific file path (and a function or section if you can name one) and explain WHY it works, not just that it exists.',
+                    '  3) 부족한 점 — 2~4 concrete weaknesses or risks. Same rule: cite a specific file/area, name the actual problem (race condition, missing retry, coupling, etc.), and say what breaks because of it.',
+                    '  4) 사용자 관점 개선 — 2~4 changes phrased from the END USER\'s perspective ("when X happens, the user currently sees Y; they should see Z"). Tie each to a code location that needs to change.',
+                    '  5) 다음 한 수 — exactly one next action, small enough to do this week.',
                    '',
                    'Hard rules — these are the things that made past reviews feel like a template:',
                    '- Do NOT write meta-sentences like "확인해야 합니다", "다음 리뷰에서는 ~를 보면 됩니다", "~로 보입니다", "~인지 확인하는 것이 핵심입니다". Either you observed it or you read the file with <read_file> right now.',
@@ -1998,12 +2081,53 @@ export class AgentExecutor {
            return false;
        }

-        const complexByShape = prompt.length > 180 || /(보고서|심층|종합\s*분석|리서치|조사|전략\s*수립|기획안|제안서|roadmap|research|report|deep\s*analysis|strategy|proposal)/i.test(prompt);
-        if (!complexByShape) {
+        const cfg = getConfig();
+        const mode = cfg.workflowMultiAgentMode || 'auto';
+
+        // 'off' → 기존 키워드/길이 휴리스틱만 사용 (legacy multiAgentEnabled 토글 존중).
+        if (mode === 'off') {
+            const legacyComplex = prompt.length > 180 || /(보고서|심층|종합\s*분석|리서치|조사|전략\s*수립|기획안|제안서|roadmap|research|report|deep\s*analysis|strategy|proposal)/i.test(prompt);
+            if (!legacyComplex) return false;
+            return configEnabled || /(보고서|심층|종합\s*분석|리서치|조사|전략\s*수립|기획안|제안서|research|report|deep\s*analysis|strategy|proposal)/i.test(prompt);
+        }
+
+        // 인사·잡담은 5단계 파이프라인 낭비. 짧은 casual prompt 는 제외.
+        if (this.isCasualConversationPrompt(prompt)) {
+            return false;
+        }
+        if (prompt.trim().length < 12) {
            return false;
        }

-        return configEnabled || /(보고서|심층|종합\s*분석|리서치|조사|전략\s*수립|기획안|제안서|research|report|deep\s*analysis|strategy|proposal)/i.test(prompt);
+        // 'always' → 위 가드만 통과하면 무조건 발동.
+        if (mode === 'always') return true;
+
+        // 'auto' → 다음 중 하나라도 만족하면 발동:
+        //   (1) 사용자가 multiAgentEnabled 를 명시적으로 켰다,
+        //   (2) 작은 모델 (≤4B params) 이라 한 번에 처리하기 위험,
+        //   (3) prompt 토큰이 효과적 context window 의 임계 이상을 차지한다,
+        //   (4) "보고서/리뷰/심층 분석" 같은 명백한 복합 작업 키워드 매치,
+        //   (5) prompt 길이 자체가 큼 (>240 chars).
+        if (configEnabled) return true;
+
+        const paramB = estimateModelParamsB(cfg.defaultModel);
+        if (paramB !== null && paramB <= 4) return true;
+
+        try {
+            const effectiveCtx = cfg.smallModelContextCap > 0 && paramB !== null && paramB <= 4
+                ? cfg.smallModelContextCap
+                : cfg.contextLength;
+            const promptTokens = estimateTokens(prompt);
+            const threshold = Math.floor(effectiveCtx * cfg.workflowAutoCtxFractionThreshold);
+            if (promptTokens >= threshold) return true;
+        } catch { /* 안전한 폴백: 키워드/길이 체크로 진행 */ }
+
+        if (/(보고서|심층|종합\s*분석|리서치|조사|전략\s*수립|기획안|제안서|코드\s*리뷰|리뷰|아키텍처|architecture|research|report|deep\s*analysis|strategy|proposal|review)/i.test(prompt)) {
+            return true;
+        }
+        if (prompt.length > 240) return true;
+
+        return false;
    }

    private buildAstraModeArchitectureContext(prompt: string): string {
@@ -2129,6 +2253,78 @@ export class AgentExecutor {
        }
    }

+    /**
+     * v2.2.69 — sliding-window 가 잘라낸 메시지들을 한 줄 요약으로 압축.
+     * 추가 LLM 호출 없이 heuristic 으로:
+     *   - 사용자 prompt 첫 문장
+     *   - assistant 답변 첫 문장 (conclusion-first 가정 — R1)
+     * 만 추출해 시간순으로 이어붙인다. 모델이 "이전에 무슨 얘기를 했는지" 의 골자만 알면 충분.
+     * `## ` 같은 마크다운 마커는 떼서 깔끔한 plain text 로 만든다.
+     */
+    private buildDroppedHistorySummary(dropped: ChatMessage[]): string {
+        if (dropped.length === 0) return '';
+        const lines: string[] = [];
+        const firstSentence = (s: string): string => {
+            const cleaned = String(s || '')
+                .replace(/^\s{0,3}#{1,6}\s+/gm, '')
+                .replace(/\*\*/g, '')
+                .replace(/`{3}[\s\S]*?`{3}/g, '[code]')
+                .replace(/\s+/g, ' ')
+                .trim();
+            // 첫 문장 (마침표/물음표/줄바꿈 기준) — 너무 길면 140자 cap.
+            const m = cleaned.match(/^[^.!?。\n]{1,140}[.!?。]?/);
+            const out = (m ? m[0] : cleaned.slice(0, 140)).trim();
+            return out;
+        };
+        let userTurnIdx = 0;
+        for (const msg of dropped) {
+            if (msg.internal) continue;
+            const content = typeof msg.content === 'string' ? msg.content : '';
+            if (!content.trim()) continue;
+            if (msg.role === 'user') {
+                userTurnIdx++;
+                lines.push(`U${userTurnIdx}: ${firstSentence(content)}`);
+            } else if (msg.role === 'assistant') {
+                lines.push(`A${userTurnIdx}: ${firstSentence(content)}`);
+            }
+        }
+        // 너무 많으면 가장 오래된 절반은 한 줄로 합치고 최근 N개만 보존.
+        const MAX_LINES = 8;
+        if (lines.length > MAX_LINES) {
+            const tail = lines.slice(-MAX_LINES);
+            const head = lines.slice(0, lines.length - MAX_LINES);
+            return `[이전 대화 요약 — 총 ${dropped.length}개 메시지가 컨텍스트 한계로 생략됨]\n(더 오래된 ${head.length}개 턴 생략됨)\n${tail.join('\n')}`;
+        }
+        return `[이전 대화 요약 — 총 ${dropped.length}개 메시지가 컨텍스트 한계로 생략됨]\n${lines.join('\n')}`;
+    }
+
+    /**
+     * v2.2.69 — 현재 요청의 mode signature 를 계산.
+     * mode 가 직전과 다르면 system prompt 에 "이전 모드: X / 현재 모드: Y" 한 줄 brige 를 끼울 수 있다.
+     */
+    private computeModeSignature(opts: { agentSkillName?: string; companyMode?: boolean; multiAgent?: boolean; brainName?: string }): string {
+        const parts = [
+            `agent=${opts.agentSkillName || 'none'}`,
+            `company=${opts.companyMode ? 'on' : 'off'}`,
+            `multi=${opts.multiAgent ? 'on' : 'off'}`,
+            `brain=${opts.brainName || '?'}`,
+        ];
+        return parts.join('|');
+    }
+
+    /**
+     * v2.2.69 — chatHistory 의 마지막 user/assistant 턴에서 사용자가 무슨 주제를 다루고 있었는지
+     * 한 줄로 뽑아 모드 전환 bridge 에 쓸 "이전 맥락" 문장을 만든다. 비어 있으면 빈 문자열.
+     */
+    private buildLastTopicLine(): string {
+        const recent = this.chatHistory.filter(m => !m.internal && (m.role === 'user' || m.role === 'assistant'));
+        if (recent.length === 0) return '';
+        const lastUser = [...recent].reverse().find(m => m.role === 'user');
+        if (!lastUser || typeof lastUser.content !== 'string') return '';
+        const topic = lastUser.content.replace(/\s+/g, ' ').trim().slice(0, 120);
+        return topic;
+    }
+
    private buildRequestHistory(history: ChatMessage[]): ChatMessage[] {
        return history.map((message) => {
            if (message.role !== 'assistant' || typeof message.content !== 'string') {
@@ -2957,17 +3153,23 @@ export class AgentExecutor {
        // 같은 엔진 내에서만 model candidate / message variant retry
        for (const candidateModel of modelCandidates) {
            for (const variant of messageVariants) {
+                const sampling = samplingToRestBody(this.lmStudioSamplingFromConfig());
                const streamBody = {
                    model: candidateModel,
                    messages: variant.messages,
                    stream: true,
                    ...(engine === 'lmstudio'
-                        ? { max_tokens: maxTokens, temperature }
-                        : { options: { num_ctx: numCtx, num_predict: maxTokens, temperature } }),
+                        // LM Studio's OpenAI-compatible REST extends the schema with top_k/min_p/
+                        // repeat_penalty (same names as Ollama). Spread the shared sampling block so
+                        // the REST fallback matches the SDK path — without it a fallback after a
+                        // dead handle quietly loses the glitch-suppression preset.
+                        ? { max_tokens: maxTokens, temperature, ...sampling }
+                        : { options: { num_ctx: numCtx, num_predict: maxTokens, temperature, ...sampling } }),
                };

                // 일시적 네트워크 오류용 retry (최대 2회, 지수 backoff)
                const MAX_RETRIES = 2;
+                let serviceDown = false;
                for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
                    try {
                        if (attempt > 0) {
@@ -3013,13 +3215,33 @@ export class AgentExecutor {
                        if (lastError.name === 'AbortError') {
                            throw lastError;
                        }
+                        // ECONNREFUSED / DNS-level failures mean the engine process isn't even
+                        // listening — no amount of retries or message-variant juggling will help.
+                        // Abandon the candidate/variant loops now and surface the "is X running?"
+                        // error fast instead of burning 12 fetch attempts before giving up.
+                        const errCode = (error?.cause?.code ?? error?.code ?? '').toString();
+                        const errMsg = lastError.message;
+                        if (
+                            errCode === 'ECONNREFUSED' || errCode === 'ENOTFOUND' || errCode === 'EAI_AGAIN'
+                            || /ECONNREFUSED|ENOTFOUND|getaddrinfo|fetch failed/i.test(errMsg)
+                        ) {
+                            serviceDown = true;
+                            logError('AI streaming request: engine appears to be down.', {
+                                engine, apiUrl, code: errCode, error: errMsg,
+                            });
+                            break; // exit retry loop
+                        }
                        logError('AI streaming request failed.', {
                            engine, variant: variant.name, apiUrl, model: candidateModel,
                            attempt, error: lastError.message
                        });
                    }
                }
+                if (serviceDown) break; // skip remaining variants
            }
+            // serviceDown also short-circuits the model-candidate loop — there is no
+            // candidate / variant the engine can answer if it isn't listening at all.
+            if (lastError && /ECONNREFUSED|ENOTFOUND|fetch failed/i.test(lastError.message)) break;
        }

        // 명확한 에러 메시지: 어느 엔진이 실패했는지 사용자에게 알림
@@ -3151,13 +3373,14 @@ export class AgentExecutor {
        const numCtx = Math.max(2048, params.contextLength ?? 32768);
        const apiUrl = buildApiUrl(baseUrl, engine, 'chat');
        const variants = this.buildEngineMessageVariants(messages, engine);
+        const sampling = samplingToRestBody(this.lmStudioSamplingFromConfig());
        const body = {
            model: modelName,
            messages: variants[0].messages,
            stream: false,
            ...(engine === 'lmstudio'
-                ? { max_tokens: maxTokens, temperature }
-                : { options: { num_ctx: numCtx, num_predict: maxTokens, temperature } }),
+                ? { max_tokens: maxTokens, temperature, ...sampling }
+                : { options: { num_ctx: numCtx, num_predict: maxTokens, temperature, ...sampling } }),
        };
        const response = await fetch(apiUrl, {
            method: 'POST',
@@ -3231,6 +3454,8 @@ export class AgentExecutor {
                    temperature: params.temperature,
                    maxTokens: params.maxTokens,
                    contextOverflowPolicy: params.contextOverflowPolicy,
+                    ...this.lmStudioSamplingFromConfig(),
+                    ...this.lmStudioRespondExtrasFromConfig(),
                    signal: params.signal,
                });
                for await (const { token, stopReason } of stream) {
@@ -3356,6 +3581,29 @@ export class AgentExecutor {
        ];
    }

+    /**
+     * Build the shared LM Studio sampling block from current config. Used by both the
+     * SDK streamer (topPSampling/topKSampling/…) and the REST body (top_p/top_k/…)
+     * so the two paths produce equivalent answers — otherwise a REST fallback after a
+     * dead SDK handle would silently drop the glitch-suppression that the SDK applies
+     * (한글 토큰 깨짐 등). Ollama also accepts these field names inside `options`.
+     */
+    private lmStudioSamplingFromConfig(): LmStudioSampling {
+        const c = getConfig();
+        return {
+            topP: c.lmStudioTopP,
+            topK: c.lmStudioTopK,
+            minP: c.lmStudioMinP,
+            repeatPenalty: c.lmStudioRepeatPenalty,
+        };
+    }
+
+    /** SDK-only extras for `respond()` — currently the draft model for speculative decoding. */
+    private lmStudioRespondExtrasFromConfig(): { draftModel?: string } {
+        const c = getConfig();
+        return c.lmStudioDraftModel ? { draftModel: c.lmStudioDraftModel } : {};
+    }
+
    private buildModelCandidates(modelName: string, engine: 'lmstudio' | 'ollama'): string[] {
        const candidates = [modelName];
        if (engine === 'lmstudio') {
@@ -1,4 +1,4 @@
-import { PlannerAgent, ResearcherAgent, ReflectorAgent, WriterAgent } from './factory';
+import { PlannerAgent, ResearcherAgent, ReflectorAgent, WriterAgent, SynthesizerAgent } from './factory';
 import { AgentEngine, PipelineStage, AgentExecuteOptions } from '../lib/engine';
 import { getConfig } from '../config';

@@ -17,9 +17,13 @@ export class AgentWorkflowManager {
        const researcher = new ResearcherAgent(modelName);
        const writer = new WriterAgent(modelName);
        // [Self-Reflection] 설정으로 비활성화하지 않은 경우에만 Reflector를 주입.
-        const enableReflection = getConfig().enableReflection !== false;
+        const cfg = getConfig();
+        const enableReflection = cfg.enableReflection !== false;
        const reflector = enableReflection ? new ReflectorAgent(modelName) : undefined;
-        const engine = new AgentEngine(planner, researcher, writer, reflector);
+        // [5-stage pipeline] 최종 합성 단계. 설정으로 끄지 않은 한 항상 주입.
+        const enableSynth = cfg.workflowSynthesizerEnabled !== false;
+        const synthesizer = enableSynth ? new SynthesizerAgent(modelName) : undefined;
+        const engine = new AgentEngine(planner, researcher, writer, reflector, synthesizer);
        const missionId = `mission_${Date.now()}`;

        const runOptions: AgentExecuteOptions = {
@@ -46,12 +50,14 @@ export class AgentWorkflowManager {
    }

    private static mapStageToUI(stage: PipelineStage): string {
+        // 사용자가 보는 라벨은 한국어 + 단계 번호로 통일. 5단계 파이프라인이 명확하게 드러나도록.
        const maps: Record<PipelineStage, string> = {
            idle: '대기',
-            planner: 'Planner',
-            researcher: 'Researcher',
-            reflector: 'Reflector',
-            writer: 'Writer',
+            planner: '① 계획',
+            researcher: '② 자료 수집',
+            reflector: '③ 자기 검증',
+            writer: '④ 초안 작성',
+            synthesizer: '⑤ 최종 정리',
            completed: '완료',
            error: '오류'
        };
@@ -134,13 +134,17 @@ Your mission is to extract, filter, and synthesize critical data based on a stra
 }

 export class WriterAgent extends BaseAgent {
-    private readonly persona = `You are the [Lead Synthesis Writer & Editor].
-Your goal is to produce a state-of-the-art final report that wows the user.
- TONE: Authoritative yet accessible. Professional developer/consultant style.
- STRUCTURE: Use an executive summary, detailed analysis sections, and a "Final Recommendation" block.
- LANGUAGE: Always respond in the user's language (KOREAN).
- POLISHING: Ensure logical flow between sections. Make it look like a premium report.
- SELF-CORRECTION: When a [REFLECTION CRITIQUE] block is provided, you MUST address each listed gap, contradiction, or missing-evidence item explicitly before producing the final report. Do not silently ignore the critique.`;
+    // [5-stage pipeline] Writer는 이제 "Drafter" 역할: 빠르게 1차 초안만 생성한다.
+    // 최종 다듬기/요약/critique 반영은 후속 SynthesizerAgent가 담당하므로,
+    // 작은 모델이 한 번에 모든 것을 끝내려 컨텍스트를 폭주시키는 일이 없도록 한다.
+    private readonly persona = `You are the [Section Drafter].
+Your goal is to produce a STRUCTURED FIRST DRAFT that the downstream Synthesizer will polish.
+- SCOPE: Cover each major topic from the research as its own section. Each section starts with a short plain-text label on its own line (e.g. "잘된 점", "부족한 점") — NO "#", "##", "**", "__", ">" markers. Use "- " for bullets, never "* ".
+- DENSITY: Pack facts; skip flowery prose, executive summaries, and closing remarks (the Synthesizer adds those).
+- TONE: Plain, factual, developer-readable Korean.
+- BREVITY: Keep each section tight — better to leave the Synthesizer something to merge than to run out of tokens mid-section.
+- SELF-CORRECTION: When a [REFLECTION CRITIQUE] block is provided, address each listed gap inline in the relevant section. Do not silently ignore the critique.
+- LANGUAGE: KOREAN.`;

    async execute(input: string, originalRequest?: string, signal?: AbortSignal, options?: AgentExecuteOptions): Promise<string> {
        // [Astra v4.0] Advisor 모드 처리
@@ -163,11 +167,61 @@ Analyze the provided report and suggest 3 high-impact next actions for the user.
            ? `\n5. [REFLECTION CRITIQUE — must be addressed]:\n${reflection.length > 4000 ? reflection.substring(0, 4000) + '... [Critique Trimmed]' : reflection}`
            : '';

-        const wrappedInput = `### SYSTEM INSTRUCTION: FINAL SYNTHESIS
+        const wrappedInput = `### SYSTEM INSTRUCTION: SECTIONED DRAFT
 1. Gathered Research Data: ${trimmedData}
 2. User's Original Objective: ${originalRequest}
 3. Applied Knowledge & Filtering Policy: ${policy}
-4. Mission: Write the definitive final report in KOREAN.${reflectionBlock}`;
+4. Mission: Produce a STRUCTURED FIRST DRAFT in KOREAN — section per topic, factual bullets allowed.
+   Do NOT add a final executive summary or closing remarks; the Synthesizer will handle those.${reflectionBlock}`;
+        return this.callLLM(this.persona, wrappedInput, signal);
+    }
+}
+
+/**
+ * [5-stage pipeline] SynthesizerAgent
+ * Drafter가 작성한 1차 초안을 받아 최종 사용자 답변으로 다듬는다.
+ *  - 입력이 "이미 정리된 draft" 라서 컨텍스트가 작다 → 작은 로컬 모델도 한 번에 처리 가능.
+ *  - 역할은 (a) 도입 한 줄 (b) 섹션 흐름 정리 (c) 결론/제안 한 단락. 새로운 사실을 만들지 않는다.
+ *  - Reflector critique이 함께 전달되면, 그 항목들이 답변에 정말 반영되었는지 한 번 더 점검한다.
+ */
+export class SynthesizerAgent extends BaseAgent {
+    private readonly persona = `You are the [Final Editor & Synthesizer].
+You receive a structured FIRST DRAFT (already broken into sections) plus the user's original request and (optionally) a reflection critique.
+Your only job is to produce the FINAL user-facing answer.
+
+[OUTPUT FORMAT — 7 hard rules — these override every other formatting habit]
+R1. CONCLUSION FIRST. The very first sentence is the conclusion / verdict / recommendation. No greeting, no "분석해보겠습니다", no scene-setting paragraph, no "핵심 요약" label line on top. Just the conclusion as sentence 1. A reader who stops after sentence 1 must know what you decided.
+R2. AT MOST 3 SECTIONS. Total. A section = a label line + body, or a clearly separated numbered group. If the answer fits without sections, use none. Three is the ceiling, not a target.
+R3. NO REPETITION. Each sentence carries new information. If you said it in the conclusion, do NOT restate it in a later section.
+R4. BOLD ≤ 3 INSTANCES. Across the entire answer, use bold at most 3 times — reserve it for truly load-bearing words (file name, verdict word, hard number). Most answers should have zero.
+R5. JUDGE WITHOUT ASKING. If a defensible decision is reachable from the draft + original request, deliver it and act. Do NOT ask permission, do NOT bounce the question back.
+R6. ASK ONE QUESTION ONLY WHEN: (a) the path forks into two materially different directions and user intent is unknown, OR (b) the next step is irreversible (delete, force-push, drop table, overwrite uncommitted work, send external message). One plain sentence on its own line at the end. No "핵심 확인 질문" label, no "질문 의도", no follow-ups.
+R7. GUESS-AND-ACT WITH STATED ASSUMPTION. If a detail is missing but a reasonable guess exists, guess and act, declaring the assumption in one line prefixed "가정:".
+
+[PLAIN TEXT]
+- NEVER emit "#", "##", "###", "__", "> " markers. Section labels are plain text on their own line.
+- Bullets: "- " only. No "* " / "• ".
+- No tables. No HTML.
+- Inline code with backticks is OK (e.g. \`src/agent.ts\`). Triple-backtick code blocks only for actual code.
+
+[CONTENT]
+- Preserve every factual claim from the draft. Do NOT invent new facts, do NOT add hidden reasoning, do NOT write meta-commentary.
+- DO NOT EMIT: <think>, <analysis>, <|channel|> markers, "Thinking Process:", planning notes, or any hidden reasoning.
+- If a [REFLECTION CRITIQUE] is provided, verify each item is addressed. If something is missing, say so explicitly rather than fabricating coverage.
+- LANGUAGE: KOREAN. Tone: direct, technical, developer-friendly.`;
+
+    async execute(input: string, originalRequest?: string, signal?: AbortSignal, options?: AgentExecuteOptions): Promise<string> {
+        const draft = input.length > 12000 ? input.substring(0, 12000) + '... [Draft Trimmed]' : input;
+        const reflection = options?.priorResults?.reflection;
+        const reflectionBlock = reflection && reflection.trim().length > 0
+            ? `\n4. [REFLECTION CRITIQUE — verify the draft addresses each item]:\n${reflection.length > 3000 ? reflection.substring(0, 3000) + '... [Critique Trimmed]' : reflection}`
+            : '';
+
+        const wrappedInput = `### SYSTEM INSTRUCTION: FINAL SYNTHESIS
+1. User's Original Request: ${originalRequest || '(unavailable)'}
+2. Structured Draft (from Drafter — your input to polish):
+${draft}
+3. Mission: Produce the FINAL user-facing answer in KOREAN. Do not restart from scratch — polish, smooth, and conclude.${reflectionBlock}`;
        return this.callLLM(this.persona, wrappedInput, signal);
    }
 }
@@ -143,6 +143,65 @@ export interface IAgentConfig {
     * 누적됨. false면 critique은 그 미션 한정으로만 사용되고 사라짐.
     */
    autoLessonFromReflection: boolean;
+    // ─── 5-stage workflow (Drafter + Synthesizer) ───
+    /** Drafter(=Writer) 출력 뒤에 SynthesizerAgent로 최종 다듬기 패스를 한 번 더 돌릴지. 기본 true. */
+    workflowSynthesizerEnabled: boolean;
+    /**
+     * Multi-Agent 발동 모드:
+     *  - 'auto' (기본): 작은 모델(≤4B) 감지 OR prompt가 컨텍스트의 큰 비중을 차지할 때만 자동 발동.
+     *  - 'always': 인사·짧은 잡담을 제외한 모든 요청에 5단계 파이프라인 사용.
+     *  - 'off': 기존 single-agent 동작 (수동 토글 / 키워드 매칭만 사용).
+     */
+    workflowMultiAgentMode: 'auto' | 'always' | 'off';
+    /**
+     * 'auto' 모드에서 prompt + brain context 토큰이 contextLength 의 이 비율(0~1)을 넘으면 강제 5단계.
+     * 기본 0.30 — 작은 모델이 30% 이상을 input으로 먹기 시작하면 한 번에 끝내려는 시도가 위험.
+     */
+    workflowAutoCtxFractionThreshold: number;
+    // ─── Stream 표시 ───
+    /**
+     * 모델 토큰을 받는 즉시 채팅 버블에 흘려보낼지 여부.
+     *  - false(기본): 토큰은 내부에서만 누적, sanitize 끝난 최종 답변만 한 번에 표시 → Harmony/think 마커 누설 원천 차단.
+     *  - true: legacy 라이브 스트리밍. 모델 출력에 control token 이 섞여 나오면 잠깐 화면에 보일 수 있음.
+     */
+    liveStreamTokens: boolean;
+    /**
+     * 최종 답변 포맷.
+     *  - 'plain' (기본): 모델이 무심코 내보낸 `##`, `**`, `__`, `> `, `* ` 등의 마크다운 마커를 후처리로 모두 제거.
+     *    섹션 라벨 텍스트(예: "핵심 요약")는 유지되지만 헤더 마커는 사라져 깔끔한 plain text 로 표시.
+     *  - 'markdown': legacy 동작. 모델 출력을 그대로 렌더러에 넘김.
+     */
+    outputFormat: 'plain' | 'markdown';
+    /**
+     * 자동 기록 (project chronicle auto-record). true 면 매 prompt 후 의미 있는 turn 을
+     * Wiki/Chronicle 폴더에 자동으로 저장. false 면 자동 저장 OFF (수동 기록은 계속 가능).
+     * 사이드바 도구 드롭다운의 토글 항목으로 즉시 변경 가능.
+     */
+    chronicleAutoRecord: boolean;
+    // ─── LM Studio sampling (applied to both SDK and REST paths) ───
+    /** LM Studio nucleus sampling cutoff (0~1). Lower tightens; 1 disables. */
+    lmStudioTopP: number;
+    /** LM Studio top-K cutoff (0 disables). */
+    lmStudioTopK: number;
+    /** LM Studio min-P floor (0~1, 0 disables). */
+    lmStudioMinP: number;
+    /** LM Studio repeat penalty (1 disables, 1.05–1.2 typical). */
+    lmStudioRepeatPenalty: number;
+    /** Render tok/s + TTFT from prediction stats into context-budget badge. */
+    lmStudioShowStatsInBudget: boolean;
+    /** LM Studio model key of a small draft model for speculative decoding ('' = disabled). */
+    lmStudioDraftModel: string;
+    /** Load-time options. Read once per load(); changing these after load needs a reload. */
+    lmStudioLoad: {
+        flashAttention: boolean;
+        /** "max" | "off" | number 0-1 */
+        gpuOffloadRatio: 'max' | 'off' | number;
+        offloadKVCacheToGpu: boolean;
+        keepModelInMemory: boolean;
+        useFp16ForKVCache: boolean;
+        /** 0 = engine default */
+        evalBatchSize: number;
+    };
 }

 // ─── 경로 정규화 유틸리티 ───
@@ -245,6 +304,40 @@ export function getConfig(): IAgentConfig {
        companyPixelOfficeBubbles: cfg.get<boolean>('company.pixelOffice.bubbles', true),
        enableReflection: cfg.get<boolean>('enableReflection', true),
        autoLessonFromReflection: cfg.get<boolean>('autoLessonFromReflection', true),
+        workflowSynthesizerEnabled: cfg.get<boolean>('workflow.synthesizerEnabled', true),
+        workflowMultiAgentMode: ((): 'auto' | 'always' | 'off' => {
+            const v = (cfg.get<string>('workflow.multiAgentMode', 'auto') || 'auto').trim().toLowerCase();
+            return v === 'always' || v === 'off' ? v : 'auto';
+        })(),
+        workflowAutoCtxFractionThreshold: Math.max(0.05, Math.min(0.95,
+            cfg.get<number>('workflow.autoCtxFractionThreshold', 0.30)
+        )),
+        liveStreamTokens: cfg.get<boolean>('liveStreamTokens', true),
+        outputFormat: ((): 'plain' | 'markdown' => {
+            const v = (cfg.get<string>('outputFormat', 'plain') || 'plain').trim().toLowerCase();
+            return v === 'markdown' ? 'markdown' : 'plain';
+        })(),
+        chronicleAutoRecord: cfg.get<boolean>('chronicleAutoRecord', true),
+        lmStudioTopP: Math.max(0, Math.min(1, cfg.get<number>('lmStudio.sampling.topP', 0.9))),
+        lmStudioTopK: Math.max(0, cfg.get<number>('lmStudio.sampling.topK', 20)),
+        lmStudioMinP: Math.max(0, Math.min(1, cfg.get<number>('lmStudio.sampling.minP', 0.05))),
+        lmStudioRepeatPenalty: Math.max(1, Math.min(2, cfg.get<number>('lmStudio.sampling.repeatPenalty', 1.1))),
+        lmStudioShowStatsInBudget: cfg.get<boolean>('lmStudio.statsInBudget', true),
+        lmStudioDraftModel: (cfg.get<string>('lmStudio.draftModel', '') || '').trim(),
+        lmStudioLoad: {
+            flashAttention: cfg.get<boolean>('lmStudio.load.flashAttention', true),
+            gpuOffloadRatio: ((): 'max' | 'off' | number => {
+                const raw = (cfg.get<string>('lmStudio.load.gpuOffloadRatio', 'max') || 'max').trim().toLowerCase();
+                if (raw === 'max' || raw === 'off') return raw;
+                const n = Number(raw);
+                if (Number.isFinite(n)) return Math.max(0, Math.min(1, n));
+                return 'max';
+            })(),
+            offloadKVCacheToGpu: cfg.get<boolean>('lmStudio.load.offloadKVCacheToGpu', true),
+            keepModelInMemory: cfg.get<boolean>('lmStudio.load.keepModelInMemory', true),
+            useFp16ForKVCache: cfg.get<boolean>('lmStudio.load.useFp16ForKVCache', false),
+            evalBatchSize: Math.max(0, cfg.get<number>('lmStudio.load.evalBatchSize', 0)),
+        },
    };
 }

@@ -223,3 +223,71 @@ export function mergeContinuationParts(prev: string, next: string): string {

 /** Rough token count of a string — re-exported helper so callers don't need contextManager directly. */
 export const countTokens = estimateTokens;
+
+/**
+ * ── Plain-text 출력 위생 ──────────────────────────────────────────────
+ * 사용자가 마크다운 렌더 없이 깔끔한 한국어 plain text 답변을 원함.
+ * 모델/페르소나가 학습된 습관으로 `##`, `**`, `> `, `* ` 등을 섞어 내보내면 화면에 그대로 노출되므로,
+ * 최종 답변 직전 한 번 더 마커를 벗겨낸다.
+ *
+ * 보존:
+ *   - 코드 블록 (```fence``` 사이 본문은 손대지 않음)
+ *   - 인라인 코드 `code` (백틱 유지)
+ *   - 숫자 목록 `1. ` `1) ` 같은 자연 표기
+ *   - 줄 시작 대시 `- ` (자연스러운 plain text bullet)
+ *
+ * 제거 / 변환:
+ *   - 줄 시작 `#`,`##`,`###`,... `[space]` → 헤더 마커 제거 (라벨 텍스트는 유지)
+ *   - `**bold**` / `__bold__` → bold (강조 마커만 제거)
+ *   - 단일 `*텍스트*` 강조 → 텍스트 (단, `* ` 불릿 / 곱셈/와일드카드 패턴은 보존)
+ *   - 줄 시작 `> ` blockquote 마커 → 제거
+ *   - 줄 시작 `* ` 불릿 → `- ` 로 정규화 (asterisk 가 강조로 오인되는 일을 줄임)
+ *   - 헤더 줄에 붙어 있던 trailing colon/space 정리
+ */
+export function stripMarkdownFormatting(text: string): string {
+    if (!text) return '';
+    // 1. 코드 블록은 통째로 보호. fenced(```...```) 만 보호하고 본문 내부는 어떤 치환도 적용 안 함.
+    const fenceParts: string[] = [];
+    let src = String(text).replace(/```[\s\S]*?```/g, (m) => {
+        fenceParts.push(m);
+        return `FENCE${fenceParts.length - 1}`;
+    });
+    // 2. 인라인 코드도 보호 (백틱 안 표현은 손대지 않는다).
+    const inlineParts: string[] = [];
+    src = src.replace(/`[^`\n]+`/g, (m) => {
+        inlineParts.push(m);
+        return `INL${inlineParts.length - 1}`;
+    });
+
+    // 3. 줄 단위 정리.
+    src = src.split('\n').map((rawLine) => {
+        let line = rawLine;
+        // 줄 시작 헤더 마커 제거 ("## 핵심 요약" → "핵심 요약")
+        line = line.replace(/^\s{0,3}#{1,6}\s+/, '');
+        // 줄 시작 blockquote 제거
+        line = line.replace(/^\s{0,3}>\s?/, '');
+        // 줄 시작 `* ` 또는 `+ ` 불릿 → `- ` 로 통일
+        line = line.replace(/^(\s*)[*+]\s+/, '$1- ');
+        return line;
+    }).join('\n');
+
+    // 4. 강조 마커 제거.
+    src = src.replace(/\*\*(.+?)\*\*/g, '$1');         // **bold**
+    src = src.replace(/__([^_\n]+?)__/g, '$1');         // __bold__
+    // 단일 별 강조: 양쪽 공백/줄경계로 둘러싸인 경우만 (`a*b*c` 같은 코드/수식은 건드리지 않음).
+    src = src.replace(/(^|[\s(\[])\*([^\s*][^*\n]*?[^\s*])\*(?=[\s).,!?;:]|$)/g, '$1$2');
+    src = src.replace(/(^|[\s(\[])\*([^\s*])\*(?=[\s).,!?;:]|$)/g, '$1$2');
+
+    // 5. 헤더가 라벨처럼 단독 줄로 남았으면 뒤에 콜론을 보장 (가독성 — "핵심 요약" → "핵심 요약" 그대로 유지하고,
+    //    사용자가 라벨임을 인지하기 쉽도록 직후 줄에 본문이 오는 형태를 유도). 콜론은 자동 추가하지 않는다 — 모델이
+    //    이미 본문을 줄바꿈해 두는 케이스가 더 흔함.
+
+    // 6. 연속 빈 줄 3개+ → 2개로.
+    src = src.replace(/\n{3,}/g, '\n\n');
+
+    // 7. 보호했던 코드/인라인 복원.
+    src = src.replace(/INL(\d+)/g, (_, i) => inlineParts[Number(i)] || '');
+    src = src.replace(/FENCE(\d+)/g, (_, i) => fenceParts[Number(i)] || '');
+
+    return src.trim();
+}
@@ -55,7 +55,7 @@ export async function activate(context: vscode.ExtensionContext) {
    // 과 별개 채널 — popup도 OutputChannel도 못 보는 경우의 마지막 안전망).
    const ext = vscode.extensions.getExtension('g1nation.astra');
    const version = ext?.packageJSON?.version || '(unknown)';
-    console.error(`[ASTRA-DEBUG] activate v${version} pid=${process.pid}`);
+    console.log(`[ASTRA-DEBUG] activate v${version} pid=${process.pid}`);
    void vscode.window.showInformationMessage(`📡 Astra v${version} activated (PID=${process.pid})`);
    logInfo(`Astra activating... version=${version} pid=${process.pid}`);

@@ -88,10 +88,22 @@ export async function activate(context: vscode.ExtensionContext) {
        client: lmStudioClient,
        activity: activityTracker,
        getConfig: () => {
+            // Read from getConfig() so we share the same setting parsers (incl. gpuOffloadRatio coercion)
+            // with the rest of the codebase instead of duplicating the logic here.
+            const ag = getConfig();
            const cfg = vscode.workspace.getConfiguration('g1nation');
            return {
                idleTimeoutMs: cfg.get<number>('lmStudio.idleTimeoutMs', 300000),
                autoLoadOnSelect: cfg.get<boolean>('lmStudio.autoLoadOnSelect', true),
+                loadConfig: {
+                    flashAttention: ag.lmStudioLoad.flashAttention,
+                    gpuOffloadRatio: ag.lmStudioLoad.gpuOffloadRatio,
+                    offloadKVCacheToGpu: ag.lmStudioLoad.offloadKVCacheToGpu,
+                    keepModelInMemory: ag.lmStudioLoad.keepModelInMemory,
+                    useFp16ForKVCache: ag.lmStudioLoad.useFp16ForKVCache,
+                    evalBatchSize: ag.lmStudioLoad.evalBatchSize,
+                },
+                draftModel: ag.lmStudioDraftModel || undefined,
            };
        },
        notifyError: (msg) => provider?.postLmStudioError(msg),
@@ -157,7 +169,7 @@ export async function activate(context: vscode.ExtensionContext) {
        lifecycle,
        activity: activityTracker,
        loadedModels: () => lmStudioClient.listLoadedCached(),
-        downloadedModels: () => lmStudioClient.listDownloaded(),
+        downloadedModels: () => lmStudioClient.listDownloadedCached(),
    });
    // One-time repair: rewrite any chronicle projects that were saved with the
    // workspace parent as their `projectRoot` (a side-effect of the old
@@ -57,7 +57,7 @@ export async function handleSlashCommand(
    const head = (spaceIdx === -1 ? trimmed : trimmed.slice(0, spaceIdx)).toLowerCase() as SlashCommand;
    const arg = spaceIdx === -1 ? '' : trimmed.slice(spaceIdx + 1).trim();

-    console.error(`[ASTRA-DEBUG] slashRouter handleSlashCommand head=${head} arg=${arg.slice(0, 40)}`);
+    logInfo(`[ASTRA-DEBUG] slashRouter handleSlashCommand head=${head} arg=${arg.slice(0, 40)}`);
    logInfo(`[SLASH] handleSlashCommand start head=${head} arg="${arg.slice(0, 60)}" bridge=${getBridgeBaseUrl()}`);
    void vscode.window.showInformationMessage(`📻 Datacollect Radio: ${head} 진입`);
    void vscode.window.setStatusBarMessage(`📻 Datacollect Radio: ${head} 처리 중…`, 5000);
@@ -118,19 +118,21 @@ export interface TrimResult<M extends BudgetMessage> {
 }

 /**
- * 대화 기록을 토큰 예산 안에 맞춥니다.
+ * 대화 기록을 토큰 예산 안에 맞춥니다 (sliding window).
 *
 * 전략:
 *  1. 항상 마지막 메시지(보통 현재 사용자 질문)는 유지.
 *  2. 최근 메시지부터 역순으로 예산이 허용하는 만큼 채움.
- *  3. 하나라도 잘렸으면 맨 앞에 `[이전 대화 N개 생략]` 마커를 끼워 모델이 맥락 누락을 인지하게 함.
+ *  3. 하나라도 잘렸으면 맨 앞에 marker 를 끼워 모델이 맥락 누락을 인지하게 함.
+ *     v2.2.69+ — marker 콜백은 droppedCount 뿐 아니라 *잘려나간 메시지 배열* 도 받아
+ *     단순 count 가 아닌 진짜 요약/맥락을 작성할 수 있다.
 *
 * 주의: 여기서 잘라내는 것은 *요청에 보낼* 메시지 배열일 뿐, UI에 표시되는 전체 기록은 그대로 둡니다.
 */
 export function trimHistoryToBudget<M extends BudgetMessage>(
    messages: M[],
    budgetTokens: number,
-    makeMarker: (droppedCount: number) => M
+    makeMarker: (droppedCount: number, droppedMessages: M[]) => M
 ): TrimResult<M> {
    if (messages.length === 0) {
        return { messages, droppedCount: 0, tokensAfter: 0 };
@@ -154,7 +156,8 @@ export function trimHistoryToBudget<M extends BudgetMessage>(

    const droppedCount = messages.length - kept.length;
    if (droppedCount > 0) {
-        const marker = makeMarker(droppedCount);
+        const droppedMessages = messages.slice(0, droppedCount);
+        const marker = makeMarker(droppedCount, droppedMessages);
        kept.unshift(marker);
        used += estimateMessageTokens(marker);
    }
@@ -51,7 +51,7 @@ export interface IAgent {
 /**
 * 파이프라인 단계 상태 정의
 */
-export type PipelineStage = 'idle' | 'planner' | 'researcher' | 'reflector' | 'writer' | 'completed' | 'error';
+export type PipelineStage = 'idle' | 'planner' | 'researcher' | 'reflector' | 'writer' | 'synthesizer' | 'completed' | 'error';

 /**
 * 감사(Audit) 이력에 기록되는 단일 상태 전환 엔트리.
@@ -453,7 +453,10 @@ export class AgentEngine {
        private readonly researcher: IAgent,
        private readonly writer: IAgent,
        // [Self-Reflection] Researcher와 Writer 사이에 주입되는 메타인지 노드. 미주입 시 기존 3단계 파이프라인을 그대로 유지.
-        private readonly reflector?: IAgent
+        private readonly reflector?: IAgent,
+        // [5-stage pipeline] Writer(=Drafter)가 만든 초안을 사용자용 최종 답변으로 다듬는 노드.
+        // 미주입 시 Writer 출력이 그대로 최종 답변이 된다(기존 동작 유지).
+        private readonly synthesizer?: IAgent
    ) {}

    /**
@@ -600,14 +603,45 @@ export class AgentEngine {
                );

                state.setResult('finalReport', finalReport);
-                
+
+                // --- Phase 4.5: Synthesizer (final polish) ---
+                // Drafter(=Writer) 출력은 "초안"이다. Synthesizer가 주어졌으면 한 번 더 압축/매끄럽게 정리한다.
+                // 입력이 작은 draft 뿐이라 컨텍스트가 가벼워, 작은 로컬 모델도 한 번에 처리할 수 있다.
+                // 실패해도 미션을 막지 않고 Drafter 출력을 그대로 사용한다(soft-fail).
+                let polishedReport = finalReport;
+                if (this.synthesizer) {
+                    try {
+                        polishedReport = await this.executeStep(
+                            state, 'synthesizer', '최종 답변 다듬기 중...',
+                            () => this.resilientExecute(state, this.synthesizer!, 'Synthesizer', finalReport, prompt, signal, onProgress, {
+                                ...options,
+                                context: brainContext,
+                                signal,
+                                config: { ...options?.config, role: 'synthesizer', isSamePrompt: true },
+                                priorResults: { plan, reflection, originalPrompt: prompt, ...options?.priorResults },
+                                abstractionLevel: 'balanced'
+                            }),
+                            `synthesizer::${finalReport}`, prompt, signal, onProgress
+                        );
+                        if (!polishedReport || polishedReport.trim().length < 24) {
+                            // 합성기가 빈/잘린 결과를 내면 안전하게 초안 사용.
+                            logError('[AgentEngine] Synthesizer returned empty/tiny output — using Drafter output.');
+                            polishedReport = finalReport;
+                        }
+                    } catch (synthErr: any) {
+                        if (synthErr?.name === 'AbortError') throw synthErr;
+                        logError(`[AgentEngine] Synthesizer soft-fail — using Drafter output: ${synthErr?.message || synthErr}`);
+                        polishedReport = finalReport;
+                    }
+                }
+
                // --- Phase 5: Advice & Standardization ---
-                const proactiveAdvice = await this.generateProactiveAdvice(finalReport, prompt, brainContext, signal);
+                const proactiveAdvice = await this.generateProactiveAdvice(polishedReport, prompt, brainContext, signal);
                
                // [Structural Fix] 생성된 제안의 무결성 검증 (최소 길이 50자 이상일 때만 append)
                const enrichedReport = proactiveAdvice && proactiveAdvice.length > 50
-                    ? `${finalReport}\n\n---\n## 💡 Astra의 선제적 제안 (Proactive Next Actions)\n${proactiveAdvice}`
-                    : finalReport;
+                    ? `${polishedReport}\n\n---\n## 💡 Astra의 선제적 제안 (Proactive Next Actions)\n${proactiveAdvice}`
+                    : polishedReport;

                const standardizedReport = WikiFormatter.format(enrichedReport, state);

@@ -1,8 +1,20 @@
-import { LMStudioClient as SDKClient, LLM } from '@lmstudio/sdk';
+import { LMStudioClient as SDKClient, LLM, type LLMLoadModelConfig } from '@lmstudio/sdk';
 import { logError, logInfo } from '../utils';

+/** Load-time options forwarded to LM Studio's `llm.load()`. Subset of `LLMLoadModelConfig`. */
+export interface LMStudioLoadConfig {
+    flashAttention?: boolean;
+    /** "max" | "off" | number 0-1 */
+    gpuOffloadRatio?: 'max' | 'off' | number;
+    offloadKVCacheToGpu?: boolean;
+    keepModelInMemory?: boolean;
+    useFp16ForKVCache?: boolean;
+    /** 0 / undefined = engine default */
+    evalBatchSize?: number;
+}
+
 export interface ILMStudioClient {
-    load(modelKey: string, signal?: AbortSignal): Promise<void>;
+    load(modelKey: string, signal?: AbortSignal, loadConfig?: LMStudioLoadConfig): Promise<void>;
    unload(modelKey: string): Promise<void>;
    listLoaded(): Promise<string[]>;
    /** Like listLoaded() but caches the result for `ttlMs` to avoid hammering the SDK. */
@@ -15,6 +27,10 @@ export interface ILMStudioClient {
     * only returns loaded models when JIT is off).
     */
    listDownloaded(): Promise<string[]>;
+    /** Cached variant; the downloaded list only changes when the user installs/removes a model. */
+    listDownloadedCached(ttlMs?: number): Promise<string[]>;
+    /** Pre-warm a draft model for speculative decoding. Idempotent + best-effort. */
+    preloadDraftModel?(draftModelKey: string): Promise<void>;
    /**
     * Resolve a chat-ready handle for an already-loaded (or just-loaded) model.
     *
@@ -42,8 +58,20 @@ export function httpToWebSocketUrl(httpBaseUrl: string): string | undefined {
        if (url.protocol === 'http:') url.protocol = 'ws:';
        else if (url.protocol === 'https:') url.protocol = 'wss:';
        else if (url.protocol !== 'ws:' && url.protocol !== 'wss:') return undefined;
-        if (url.pathname.endsWith('/v1')) url.pathname = url.pathname.slice(0, -3);
-        if (url.pathname.endsWith('/api')) url.pathname = url.pathname.slice(0, -4);
+        // Strip every REST-only path suffix LM Studio ships with so the SDK lands on the
+        // WebSocket root. Loop because /api/v0 → /api → '' should fully unwind.
+        const REST_SUFFIXES = ['/api/v0', '/api/v1', '/v1', '/api'];
+        let changed = true;
+        while (changed) {
+            changed = false;
+            for (const suffix of REST_SUFFIXES) {
+                if (url.pathname.endsWith(suffix)) {
+                    url.pathname = url.pathname.slice(0, -suffix.length);
+                    changed = true;
+                    break;
+                }
+            }
+        }
        const out = url.toString().replace(/\/+$/, '');
        return out;
    } catch {
@@ -55,7 +83,9 @@ export class LMStudioClient implements ILMStudioClient {
    private _sdk: SDKClient | undefined;
    private _wsUrl: string | undefined;
    private _loadedCache: { value: string[]; expiresAt: number } | undefined;
+    private _downloadedCache: { value: string[]; expiresAt: number } | undefined;
    private static readonly DEFAULT_LOADED_CACHE_TTL_MS = 5000;
+    private static readonly DEFAULT_DOWNLOADED_CACHE_TTL_MS = 60_000;

    constructor(httpBaseUrl: string) {
        this.setBaseUrl(httpBaseUrl);
@@ -67,6 +97,7 @@ export class LMStudioClient implements ILMStudioClient {
            this._wsUrl = ws;
            this._sdk = undefined;
            this._loadedCache = undefined;
+            this._downloadedCache = undefined;
        }
    }

@@ -77,17 +108,53 @@ export class LMStudioClient implements ILMStudioClient {
        return this._sdk;
    }

-    async load(modelKey: string, signal?: AbortSignal): Promise<void> {
+    async load(modelKey: string, signal?: AbortSignal, loadConfig?: LMStudioLoadConfig): Promise<void> {
        try {
-            await this.getSdk().llm.load(modelKey, signal ? { signal } : undefined);
+            const opts: { signal?: AbortSignal; config?: LLMLoadModelConfig } = {};
+            if (signal) opts.signal = signal;
+            const config = this._buildLoadConfig(loadConfig);
+            if (Object.keys(config).length > 0) opts.config = config;
+            await this.getSdk().llm.load(modelKey, Object.keys(opts).length > 0 ? opts : undefined);
            this._loadedCache = undefined;
-            logInfo('LM Studio model loaded.', { modelKey });
+            // Loading does not change the downloaded-models set; leave _downloadedCache alone.
+            logInfo('LM Studio model loaded.', { modelKey, configKeys: Object.keys(config) });
        } catch (e: any) {
            const msg = e?.message ?? String(e);
            throw new LMStudioLifecycleError(`Failed to load LM Studio model "${modelKey}": ${msg}`, e);
        }
    }

+    /** Translate our flat LMStudioLoadConfig into LM Studio's nested LLMLoadModelConfig shape. */
+    private _buildLoadConfig(lc: LMStudioLoadConfig | undefined): LLMLoadModelConfig {
+        const out: LLMLoadModelConfig = {};
+        if (!lc) return out;
+        if (typeof lc.flashAttention === 'boolean') out.flashAttention = lc.flashAttention;
+        if (typeof lc.offloadKVCacheToGpu === 'boolean') out.offloadKVCacheToGpu = lc.offloadKVCacheToGpu;
+        if (typeof lc.keepModelInMemory === 'boolean') out.keepModelInMemory = lc.keepModelInMemory;
+        if (typeof lc.useFp16ForKVCache === 'boolean') out.useFp16ForKVCache = lc.useFp16ForKVCache;
+        if (typeof lc.evalBatchSize === 'number' && lc.evalBatchSize > 0) out.evalBatchSize = lc.evalBatchSize;
+        if (lc.gpuOffloadRatio !== undefined) {
+            // GPUSetting is deprecated but still accepted — wraps a single `ratio`.
+            out.gpu = { ratio: lc.gpuOffloadRatio as any };
+        }
+        return out;
+    }
+
+    async preloadDraftModel(draftModelKey: string): Promise<void> {
+        const key = (draftModelKey || '').trim();
+        if (!key) return;
+        try {
+            const llm: any = this.getSdk().llm;
+            if (typeof llm.unstable_preloadDraftModel === 'function') {
+                await llm.unstable_preloadDraftModel(key);
+                logInfo('LM Studio draft model preloaded.', { draftModelKey: key });
+            }
+        } catch (e: any) {
+            // Best-effort — the main model's respond({draftModel}) will still load it lazily.
+            logError('LM Studio draft model preload failed.', { draftModelKey: key, error: e?.message ?? String(e) });
+        }
+    }
+
    async unload(modelKey: string): Promise<void> {
        try {
            await this.getSdk().llm.unload(modelKey);
@@ -99,6 +166,12 @@ export class LMStudioClient implements ILMStudioClient {
        }
    }

+    /** Force the next downloaded/loaded-models call to re-fetch (use after install / remove). */
+    invalidateCaches(): void {
+        this._loadedCache = undefined;
+        this._downloadedCache = undefined;
+    }
+
    async listLoaded(): Promise<string[]> {
        try {
            const items: any[] = await this.getSdk().llm.listLoaded();
@@ -138,6 +211,20 @@ export class LMStudioClient implements ILMStudioClient {
        }
    }

+    async listDownloadedCached(ttlMs: number = LMStudioClient.DEFAULT_DOWNLOADED_CACHE_TTL_MS): Promise<string[]> {
+        const now = Date.now();
+        if (this._downloadedCache && this._downloadedCache.expiresAt > now) {
+            return this._downloadedCache.value.slice();
+        }
+        const value = await this.listDownloaded();
+        // Only cache non-empty results — an empty array often signals a transient SDK error,
+        // and caching that for 60s would hide a freshly-started LM Studio process.
+        if (value.length > 0) {
+            this._downloadedCache = { value, expiresAt: now + ttlMs };
+        }
+        return value.slice();
+    }
+
    async getModelHandle(modelKey: string, options?: { refresh?: boolean }): Promise<LLM> {
        try {
            if (options?.refresh) {
@@ -1,4 +1,4 @@
-import type { ILMStudioClient } from './client';
+import type { ILMStudioClient, LMStudioLoadConfig } from './client';
 import type { IActivityTracker } from './activityTracker';
 import type { EngineKind } from '../utils';
 import type { ISystemSpecsProvider, IModelMemoryEstimator } from '../system/specs';
@@ -9,6 +9,10 @@ export type LifecycleState = 'idle' | 'loading' | 'loaded' | 'streaming' | 'unlo
 export interface LifecycleConfig {
    idleTimeoutMs: number;
    autoLoadOnSelect: boolean;
+    /** Forwarded to `llm.load()` config field. Omit to use engine defaults. */
+    loadConfig?: LMStudioLoadConfig;
+    /** When set, the lifecycle manager pre-warms this draft model after every successful load. */
+    draftModel?: string;
 }

 export interface LifecycleManagerDeps {
@@ -274,11 +278,16 @@ export class ModelLifecycleManager {
        const ac = new AbortController();
        this.loadAbort = ac;
        try {
-            await this.deps.client.load(modelKey, ac.signal);
+            const cfg = this.deps.getConfig();
+            await this.deps.client.load(modelKey, ac.signal, cfg.loadConfig);
            if (this.loadAbort !== ac) return; // superseded by a newer switch
            this.loadAbort = undefined;
            this.state = 'loaded';
            this.resetIdleTimer();
+            // Pre-warm the draft model so the first speculative prediction doesn't pay a cold-load cost.
+            if (cfg.draftModel && this.deps.client.preloadDraftModel) {
+                void this.deps.client.preloadDraftModel(cfg.draftModel);
+            }
        } catch (e: any) {
            if (ac.signal.aborted) return; // superseded — newer switch owns state
            logError('LM Studio model load failed.', { model: modelKey, error: e?.message ?? String(e) });
@@ -7,6 +7,30 @@ export interface ChatStreamMessage {
    content: string;
 }

+/** Shared sampling block. SDK and REST paths both read this — keep them in sync. */
+export interface LmStudioSampling {
+    topP?: number;
+    topK?: number;
+    minP?: number;
+    repeatPenalty?: number;
+}
+
+/**
+ * Translate the sampling block into the OpenAI-compatible REST body extension that LM Studio
+ * understands. Ollama uses the same field names inside `options`. Returns an object you can
+ * spread into either body. Values <= 0 / <= 1 (penalty) are dropped so they fall back to engine
+ * defaults instead of effectively disabling sampling.
+ */
+export function samplingToRestBody(s: LmStudioSampling | undefined): Record<string, number> {
+    const out: Record<string, number> = {};
+    if (!s) return out;
+    if (typeof s.topP === 'number' && s.topP > 0 && s.topP <= 1) out.top_p = s.topP;
+    if (typeof s.topK === 'number' && s.topK > 0) out.top_k = s.topK;
+    if (typeof s.minP === 'number' && s.minP > 0 && s.minP <= 1) out.min_p = s.minP;
+    if (typeof s.repeatPenalty === 'number' && s.repeatPenalty > 1) out.repeat_penalty = s.repeatPenalty;
+    return out;
+}
+
 export interface ChatStreamRequest {
    modelName: string;
    messages: ChatStreamMessage[];
@@ -15,17 +39,39 @@ export interface ChatStreamRequest {
    maxTokens?: number;
    /** LM Studio context-overflow safety net used only if the prompt still exceeds the window. */
    contextOverflowPolicy?: 'stopAtLimit' | 'truncateMiddle' | 'rollingWindow';
+    /** Sampling — defaults match small-model glitch-suppression presets. Each is omitted from the SDK call when undefined. */
+    topP?: number;
+    topK?: number;
+    minP?: number;
+    repeatPenalty?: number;
+    /** Draft model key for speculative decoding. Empty/undefined disables. */
+    draftModel?: string;
    signal?: AbortSignal;
 }

+/** Subset of LM Studio's `PredictionResult.stats` we expose to callers. */
+export interface ChatStreamStats {
+    tokensPerSecond?: number;
+    timeToFirstTokenSec?: number;
+    predictedTokensCount?: number;
+    promptTokensCount?: number;
+    totalTimeSec?: number;
+    /** Speculative decoding (only set when `draftModel` was used). */
+    draftModelKey?: string;
+    draftTokensCount?: number;
+    acceptedDraftTokensCount?: number;
+}
+
 /**
 * One stream event. `token` carries generated text (possibly empty for the final event);
 * `stopReason` is set on the *last* event only and is the SDK's `stats.stopReason`
 * (e.g. `eosFound`, `maxPredictedTokensReached`, `contextLengthReached`, `userStopped`).
+ * `stats` is also set on the *last* event when LM Studio reports prediction stats.
 */
 export interface ChatStreamEvent {
    token: string;
    stopReason?: string;
+    stats?: ChatStreamStats;
 }

 export interface IChatStreamer {
@@ -72,24 +118,25 @@ export class LMStudioStreamer implements IChatStreamer {
            const model = await this.client.getModelHandle(trimmedModel, refresh ? { refresh: true } : undefined);
            logInfo('LM Studio SDK chat stream started.', { model: trimmedModel, messageCount: req.messages.length, attempt });

-            const prediction = (model as any).respond(req.messages, {
+            // Sampling defaults match the historical glitch-suppression preset for small /
+            // quantized models (한글 토큰 깨짐 방지) but are now overridable per-call.
+            const respondOpts: any = {
                temperature: req.temperature,
                maxTokens: req.maxTokens ?? 4096,
-                // Glitch suppression: a small / quantized model samples wrong
-                // neighbour tokens (Korean syllable corruption like 붕괴→붕점,
-                // 핵심→핵점) when the distribution is left wide. A tight nucleus
-                // + top-k and a min-p floor cut the low-probability tail;
-                // repeatPenalty curbs stutter (것입니다서입니다).
-                topPSampling: 0.9,
-                topKSampling: 20,
-                minPSampling: 0.05,
-                repeatPenalty: 1.1,
                // Safety net: if our own token budgeting still underestimated and the prompt
                // exceeds the model's context window, decide whether the SDK should fail
                // loudly (stopAtLimit — default) or silently drop content.
                contextOverflowPolicy: req.contextOverflowPolicy ?? 'stopAtLimit',
                signal: req.signal,
-            });
+            };
+            if (typeof req.topP === 'number') respondOpts.topPSampling = req.topP;
+            if (typeof req.topK === 'number' && req.topK > 0) respondOpts.topKSampling = req.topK;
+            if (typeof req.minP === 'number' && req.minP > 0) respondOpts.minPSampling = req.minP;
+            if (typeof req.repeatPenalty === 'number' && req.repeatPenalty > 1) respondOpts.repeatPenalty = req.repeatPenalty;
+            // Speculative decoding — LM Studio loads the draft model lazily on first use if needed
+            // (we also `preloadDraftModel` after main load to avoid that cold cost).
+            if (req.draftModel && req.draftModel.trim()) respondOpts.draftModel = req.draftModel.trim();
+            const prediction = (model as any).respond(req.messages, respondOpts);

            // Bridge AbortSignal → prediction.cancel(): without this, an
            // aborted request keeps generating on the LM Studio server. The
@@ -128,24 +175,58 @@ export class LMStudioStreamer implements IChatStreamer {
                if (req.signal?.aborted) return;
                // The prediction object is also a Promise<PredictionResult>; awaiting it after
                // the stream drains gives us stats.stopReason so callers can tell a truncated
-                // answer (maxPredictedTokensReached / contextLengthReached) from a normal one.
+                // answer (maxPredictedTokensReached / contextLengthReached) from a normal one,
+                // plus throughput numbers (tok/s, TTFT) we surface to the UI.
                let stopReason: string | undefined;
+                let stats: ChatStreamEvent['stats'];
                try {
                    const result: any = await prediction;
                    stopReason = result?.stats?.stopReason;
-                    if (stopReason) {
-                        logInfo('LM Studio SDK chat stream finished.', { model: trimmedModel, stopReason, tokensYielded: yielded });
+                    const s = result?.stats;
+                    if (s) {
+                        stats = {
+                            tokensPerSecond: typeof s.tokensPerSecond === 'number' ? s.tokensPerSecond : undefined,
+                            timeToFirstTokenSec: typeof s.timeToFirstTokenSec === 'number' ? s.timeToFirstTokenSec : undefined,
+                            predictedTokensCount: typeof s.predictedTokensCount === 'number' ? s.predictedTokensCount : undefined,
+                            promptTokensCount: typeof s.promptTokensCount === 'number' ? s.promptTokensCount : undefined,
+                            totalTimeSec: typeof s.totalTimeSec === 'number' ? s.totalTimeSec : undefined,
+                            draftModelKey: typeof s.usedDraftModelKey === 'string' ? s.usedDraftModelKey : undefined,
+                            draftTokensCount: typeof s.totalDraftTokensCount === 'number' ? s.totalDraftTokensCount : undefined,
+                            acceptedDraftTokensCount: typeof s.acceptedDraftTokensCount === 'number' ? s.acceptedDraftTokensCount : undefined,
+                        };
+                    }
+                    if (stopReason || stats) {
+                        logInfo('LM Studio SDK chat stream finished.', {
+                            model: trimmedModel, stopReason, tokensYielded: yielded,
+                            tokensPerSecond: stats?.tokensPerSecond, ttftSec: stats?.timeToFirstTokenSec,
+                        });
                    }
                } catch { /* result unavailable on some SDK versions — non-fatal */ }
+                // Empty-but-clean stream is treated like a dead handle on attempt 1:
+                // recreate the SDK and try once more. Same root cause (handle bound to
+                // a stale prediction) but no exception is thrown — just an empty stream.
+                if (yielded === 0 && attempt === 1) {
+                    logInfo('Empty SDK stream with no error — retrying with a fresh SDK.', { model: trimmedModel });
+                    continue;
+                }
                // Don't claim `eosFound` if we couldn't actually read the stop reason — leave it
                // undefined so the caller treats it as 'unknown' (and its mid-sentence heuristics kick in).
-                yield { token: '', stopReason };
+                yield { token: '', stopReason, stats };
                return;
            }

            const errMsg = String(caught?.message ?? caught);
-            const handleDead = /\bdisposed\b/i.test(errMsg)
-                || /lock\(\) request could not be registered/i.test(errMsg);
+            // Broaden the "handle is bound to a dead WebSocket binding" detection. All of
+            // these resolve with the same fix (recreate the SDK client so the next
+            // llm.model() lookup mints a fresh handle).
+            const handleDead =
+                /\bdisposed\b/i.test(errMsg)
+                || /lock\(\) request could not be registered/i.test(errMsg)
+                || /channel\s+closed/i.test(errMsg)
+                || /WebSocket\s+(?:is\s+not\s+open|closed|disconnected)/i.test(errMsg)
+                || /Connection\s+(?:lost|reset|closed)/i.test(errMsg)
+                || /\bECONNRESET\b/i.test(errMsg)
+                || /socket\s+hang\s*up/i.test(errMsg);

            if (handleDead && yielded === 0 && attempt === 1) {
                logInfo('Dead LM Studio handle detected — retrying with a fresh SDK.', { model: trimmedModel, error: errMsg });
@@ -16,7 +16,7 @@ export async function handleChatMessage(provider: SidebarChatProvider, data: any
    switch (data.type) {
        case 'prompt':
        case 'promptWithFile':
-            console.error(`[ASTRA-DEBUG] prompt case entered type=${data?.type} value=${JSON.stringify(String(data?.value ?? '').slice(0, 80))}`);
+            logInfo(`[ASTRA-DEBUG] prompt case entered type=${data?.type} value=${JSON.stringify(String(data?.value ?? '').slice(0, 80))}`);
            provider._lmStudio?.activity.bump();
            // ── 📻 Datacollect Radio (slash 명령) 우선 분기 ──
            // 주의: globalState.update보다 *먼저* 잡는다 — 글로벌 state가 ~1MB까지
@@ -25,7 +25,7 @@ export async function handleChatMessage(provider: SidebarChatProvider, data: any
            if (typeof data.value === 'string') {
                const { isSlashCommand, handleSlashCommand } = await import('../features/datacollect/slashRouter');
                const matched = isSlashCommand(data.value);
-                console.error(`[ASTRA-DEBUG] slash check matched=${matched} hasView=${!!provider._view}`);
+                logInfo(`[ASTRA-DEBUG] slash check matched=${matched} hasView=${!!provider._view}`);
                logInfo(`[SLASH] prompt received: ${JSON.stringify(data.value).slice(0, 100)} matched=${matched} hasView=${!!provider._view}`);
                if (matched) {
                    if (!provider._view?.webview) {
@@ -46,6 +46,13 @@ export async function handleChronicleMessage(provider: SidebarChatProvider, data
        case 'writeChronicleRecord':
            await provider._writeChronicleRecord(data.recordType);
            return true;
+        case 'setChronicleAutoRecord':
+            // v2.2.70 — 자동 기록 On/Off 토글. 도구 드롭다운 메뉴에서 호출.
+            await provider._setChronicleAutoRecord(!!data.enabled);
+            return true;
+        case 'getChronicleAutoRecord':
+            await provider._sendChronicleAutoRecordStatus();
+            return true;
        default:
            return false;
    }
@@ -886,6 +886,12 @@ export class SidebarChatProvider implements vscode.WebviewViewProvider, BridgeIn

        void this._restoreActiveSessionIntoView();
        void this._sendReadyStatus();
+        // v2.2.66 — initial-load 단계에서도 brain/models/agents 를 한 번 더 푸시한다.
+        // 기존엔 webview 의 'ready' 핸드셰이크에만 의존했는데, 그 체인 도중 하나가 throw 하면
+        // 나머지 populate 가 통째로 안 돌아 dropdown 이 비는 회귀가 발생할 수 있다. 이중 보장.
+        void this._sendBrainProfiles();
+        void this._sendAgentsList();
+        void this._sendModels();

        viewDisposables.push(webviewView.webview.onDidReceiveMessage(async (data) => {
            // dispatch root 진입 trace — "/benchmark 입력했는데 아무 응답 없음" 같은
@@ -1263,6 +1269,8 @@ export class SidebarChatProvider implements vscode.WebviewViewProvider, BridgeIn
            description: profile.description || '',
            repo: profile.secondBrainRepo || ''
        }));
+        // v2.2.66 — dropdown 이 갑자기 비는 회귀가 보고됨. 무엇이 실제로 전송되는지 추적.
+        logInfo(`[_sendBrainProfiles] profiles=${profiles.length} activeBrainId=${activeBrain.id} active=${activeBrain.name}`);
        this._view.webview.postMessage({
            type: 'brainProfiles',
            value: {
@@ -3368,7 +3376,37 @@ export class SidebarChatProvider implements vscode.WebviewViewProvider, BridgeIn
        }
    }

+    /**
+     * v2.2.70 — 도구 드롭다운의 "자동 기록" 토글에서 호출. config 를 즉시 갱신하고 webview 에
+     * 새 상태를 푸시. globalState 갱신이 아닌 vscode 설정 갱신이므로 다음 세션까지 영구 유지.
+     */
+    async _setChronicleAutoRecord(enabled: boolean): Promise<void> {
+        try {
+            await vscode.workspace.getConfiguration('g1nation').update(
+                'chronicleAutoRecord', !!enabled, vscode.ConfigurationTarget.Global
+            );
+            logInfo(`[chronicleAutoRecord] toggled → ${enabled ? 'ON' : 'OFF'}`);
+        } catch (e: any) {
+            logError('[chronicleAutoRecord] update failed', { error: e?.message || String(e) });
+        }
+        await this._sendChronicleAutoRecordStatus();
+    }
+
+    /** Send current 자동 기록 enabled flag to the webview so the Tools menu can render the toggle state. */
+    async _sendChronicleAutoRecordStatus(): Promise<void> {
+        if (!this._view) return;
+        this._view.webview.postMessage({
+            type: 'chronicleAutoRecordStatus',
+            value: { enabled: getConfig().chronicleAutoRecord !== false }
+        });
+    }
+
    async _autoWriteChronicleAfterPrompt() {
+        // v2.2.70 — 자동 기록 OFF (g1nation.chronicleAutoRecord=false) 면 즉시 종료.
+        // 수동 기록 (도구 메뉴, /wiki 명령 등) 은 영향받지 않는다.
+        if (getConfig().chronicleAutoRecord === false) {
+            return;
+        }
        const history = this._agent.getHistory();
        const latestUser = [...history].reverse().find(message => message.role === 'user')?.content || '';
        const latestAssistant = [...history].reverse().find(message => message.role === 'assistant')?.content || '';
@@ -4056,7 +4094,20 @@ export class SidebarChatProvider implements vscode.WebviewViewProvider, BridgeIn
        const mediaRoot = vscode.Uri.joinPath(this._extensionUri, 'media');
        const stylesUri = webview.asWebviewUri(vscode.Uri.joinPath(mediaRoot, 'sidebar.css')).toString();
        const scriptUri = webview.asWebviewUri(vscode.Uri.joinPath(mediaRoot, 'sidebar.js')).toString();
+        // VS Code의 outer webview iframe이 codicon.ttf를 data:font/ttf 로 inject한다.
+        // 기본 CSP는 font-src 'self' https://*.vscode-cdn.net 라 data: 가 빠져 있어
+        // DevTools에 violation 경고가 매번 찍힘. 우리가 명시적 CSP를 박아 data: 를
+        // 허용해 주면 호스트 iframe도 같은 CSP를 상속하면서 경고가 사라진다.
+        const csp = [
+            `default-src 'none'`,
+            `img-src ${webview.cspSource} https: data:`,
+            `style-src ${webview.cspSource} 'unsafe-inline'`,
+            `script-src ${webview.cspSource} https://cdn.jsdelivr.net 'unsafe-inline'`,
+            `font-src ${webview.cspSource} https: data:`,
+            `connect-src ${webview.cspSource} https:`,
+        ].join('; ');
        return SidebarChatProvider._htmlTemplateCache
+            .replace('__CSP__', csp)
            .replace('__STYLES_URI__', stylesUri)
            .replace('__SCRIPT_URI__', scriptUri);
    }
@@ -230,41 +230,37 @@ Step 2 (after the real scripts are known — pick the actual one, never a guesse
 Then reply with one short line stating what was started and where.

 [STRICT GLOBAL RULES]
-1. [NO EMOJIS - ABSOLUTE RULE] NEVER use ANY emojis, emoticons, Unicode pictorial symbols (including but not limited to emoji, kaomoji, Unicode icons), or decorative symbols anywhere in your response. NO EXCEPTIONS. Use plain text dashes (-) or asterisks (*) for bullets. Use plain markdown ## for headers. This rule overrides ALL other formatting instructions.
-2. [HEADINGS] Every markdown heading must be unique, appear exactly once, and start with exactly one "## " — never "## ##", never "### ###". One space after the hashes.
+1. [NO EMOJIS - ABSOLUTE RULE] NEVER use ANY emojis, emoticons, Unicode pictorial symbols (including but not limited to emoji, kaomoji, Unicode icons), or decorative symbols anywhere in your response. NO EXCEPTIONS. Use plain text dashes (-) for bullets. This rule overrides ALL other formatting instructions.
+2. [NO MARKDOWN MARKERS] PLAIN TEXT ONLY. Do NOT emit "#", "##", "###", "**", "__", "> ", "* " as formatting. Section labels are bare Korean words on their own line (e.g. a line that says just "핵심 요약" — no "#", no "**"). Bullets use "- " only. Inline code with backticks (e.g. \`src/agent.ts\`) and triple-backtick code blocks for actual code are fine.
 3. [NO INTERNAL LOGS] Never output <details>, "2nd Brain Trace", or "Debug JSON" blocks.
 4. [NO SECTION LEAKAGE] Never output sections named "요청 요약", "사용자 의도 추론", "프로젝트 기록 대상 확인", "핵심 확인 질문", or "근거 파일 경로".

-[OUTPUT FORMAT]
-LENGTH decides structure — not topic. Count how long your answer will be:
+[OUTPUT FORMAT — 7 hard rules]
+These rules override any other formatting habit. Apply them to EVERY answer.

- If the answer is longer than ~4 sentences (analysis, advice, planning, troubleshooting, or any multi-part answer), you MUST lead with a summary block, then the detail:
+R1. CONCLUSION FIRST. The very first sentence of the response is the conclusion / verdict / recommendation. No greeting, no "분석해보겠습니다", no scene-setting paragraph, no "핵심 요약" label line. Just the conclusion as the opening sentence. The user must be able to stop after sentence 1 and still know what you decided.

-  ## 핵심 요약
-  - 2 to 4 bullet points. Each bullet is one scannable, self-contained takeaway that captures the WHOLE answer — a reader who stops here still gets the gist.
-  - This block is ALWAYS the very first thing in the response. NEVER place a summary at the bottom. NEVER write an intro paragraph before it — the summary block IS the opening.
+R2. AT MOST 3 SECTIONS. Total. Across the entire answer. A "section" = a labeled block (a label line followed by its body) OR a clearly separated numbered group. If you can answer without sections, do so. Three is the ceiling, not a target.

-  ## 상세 설명
-  Free-form depth. You MAY use your own sub-headers here (e.g. "### 1. ...", "### 2. ..."). This is where the full reasoning and steps go.
+R3. NO REPETITION. Never restate the same point twice in different words. Each sentence contributes new information. If you already said it in the conclusion, do NOT say it again in a later section.

-  ## 제안  ← Optional. Only include if a meaningfully better alternative exists. Omit otherwise.
+R4. BOLD ≤ 3 INSTANCES. Across the whole answer, use bold for emphasis at most 3 times. Reserve it for the truly load-bearing words (a file name, a verdict word, a hard number). Most answers should have zero.

- If the answer is ~4 sentences or fewer (quick fact, simple update, casual or emotional reply) — answer directly, no headers, no summary block.
+R5. JUDGE WITHOUT ASKING. If you can reach a defensible decision from the current context, deliver the decision and act. Do NOT ask permission to proceed, do NOT ask the user to clarify what they already implied, do NOT bounce the question back ("어떻게 진행할까요?").

-The summary block is named exactly "## 핵심 요약" and goes at the TOP. A section literally named "요약" placed at the end is a bug — never do that.
+R6. ASK ONE QUESTION ONLY WHEN. Exactly one of these holds:
+    (a) The path forks into two materially different directions and you cannot tell which the user wants, OR
+    (b) The next concrete step is irreversible (delete, force-push, drop table, overwrite uncommitted work, send external message).
+   In those cases: ONE plain sentence on its own line at the end. No "핵심 확인 질문" label, no "질문 의도" explanation, no follow-ups.

-[FOLLOW-UP QUESTION RULES]
-A follow-up question is a precision tool, not a ritual.
-Ask ONE focused question at the very end of the response ONLY if:
- The user's intent is genuinely ambiguous with multiple valid paths, OR
- A critical missing detail would make the current answer completely wrong.
-If neither condition is met, give a definitive answer and stop.
-When you do ask: it is ONE plain sentence on its own line. NEVER put it under a heading, NEVER label the section ("핵심 확인 질문", "확인 질문" etc.), NEVER attach a "질문 의도" explanation, NEVER ask two or more questions.
+R7. GUESS-AND-ACT WITH STATED ASSUMPTION. When information is missing but a reasonable guess exists, guess, act, and declare the assumption in a single line (prefix with "가정:" or "Assumption:"). Do NOT stop to ask just because a detail is fuzzy.
+
+[OUTPUT — plain text]
+PLAIN TEXT only. Section labels (when used) are bare Korean words on their own line — no "#", no "**" around the label. Bullets use "- " only. Inline code with backticks (e.g. \`src/agent.ts\`) and triple-backtick code blocks for actual code are fine.

 [ENGINEERING STANCE]
 - Be a direct engineering partner. Technical precision over polite filler.
- Give the verdict first, then explain tradeoffs.
- Collapse checklists into: verdict → reason → risk → next move.
+- Collapse checklists into: verdict → reason → risk → next move. (R1 already requires the verdict to be sentence 1.)
 - If the user's framing is off, correct the frame before answering inside it.
 - Simplify complex choices into 1-2 crisp options. Never write a balanced essay when a recommendation is possible.
 - Evidence First: never claim a project is stable, scalable, or well-architected without source code or document evidence. If evidence is thin, say so and name the files to inspect next.
@@ -49,7 +49,9 @@ describe('contextManager.computeOutputBudget', () => {
 });

 describe('contextManager.trimHistoryToBudget', () => {
-    const marker = (n: number): BudgetMessage => ({ role: 'system', content: `[dropped ${n}]`, internal: true });
+    // v2.2.69: makeMarker now also receives the dropped messages array so callers can build a real summary.
+    // Tests don't need the dropped payload — just keep the signature compatible.
+    const marker = (n: number, _dropped?: BudgetMessage[]): BudgetMessage => ({ role: 'system', content: `[dropped ${n}]`, internal: true });
    it('keeps everything when under budget', () => {
        const msgs: BudgetMessage[] = [{ role: 'user', content: 'hi' }, { role: 'assistant', content: 'hello' }];
        const r = trimHistoryToBudget(msgs, 10_000, marker);
@@ -63,7 +65,25 @@ describe('contextManager.trimHistoryToBudget', () => {
        expect(r.messages[0].content).toMatch(/^\[dropped \d+\]$/);
        // most recent message survives
        expect(r.messages[r.messages.length - 1]).toEqual(msgs[msgs.length - 1]);
-        expect(r.tokensAfter).toBeLessThanOrEqual(250 + estimateMessagesTokens([marker(1)]));
+        expect(r.tokensAfter).toBeLessThanOrEqual(250 + estimateMessagesTokens([marker(1, [])]));
+    });
+    it('passes the dropped messages array to the marker factory (v2.2.69)', () => {
+        const msgs: BudgetMessage[] = Array.from({ length: 6 }, (_, i) => ({
+            role: i % 2 ? 'assistant' : 'user',
+            content: 'x'.repeat(400),
+        }));
+        let observedDropped: BudgetMessage[] | undefined;
+        const factory = (n: number, dropped: BudgetMessage[]): BudgetMessage => {
+            observedDropped = dropped;
+            return { role: 'system', content: `[summary of ${n}: first=${dropped[0]?.role}]`, internal: true };
+        };
+        const r = trimHistoryToBudget(msgs, 250, factory);
+        expect(r.droppedCount).toBeGreaterThan(0);
+        expect(observedDropped).toBeDefined();
+        expect(observedDropped!.length).toBe(r.droppedCount);
+        // Dropped messages are the OLDEST ones, in order.
+        expect(observedDropped![0]).toEqual(msgs[0]);
+        expect(r.messages[0].content).toMatch(/^\[summary of \d+: first=user\]$/);
    });
    it('always keeps at least the last message even if it alone exceeds the budget', () => {
        const msgs: BudgetMessage[] = [{ role: 'user', content: 'short' }, { role: 'user', content: 'y'.repeat(5000) }];
@@ -82,6 +82,10 @@ class FakeLMStudioClient implements ILMStudioClient {
        return [];
    }

+    async listDownloadedCached(): Promise<string[]> {
+        return [];
+    }
+
    async getModelHandle(_modelKey: string): Promise<any> {
        return {};
    }
@@ -80,6 +80,7 @@ class FakeClient implements ILMStudioClient {
    async listLoaded(): Promise<string[]> { return []; }
    async listLoadedCached(): Promise<string[]> { return []; }
    async listDownloaded(): Promise<string[]> { return []; }
+    async listDownloadedCached(): Promise<string[]> { return []; }
    async isReachable(): Promise<boolean> { return true; }

    async getModelHandle(modelKey: string): Promise<any> {
@@ -110,11 +110,13 @@ describe('local project path preflight', () => {
        const guidance = agent.buildLocalProjectIntentGuidance('review-evaluation');

        expect(guidance).toContain('Intent operating contract — Code Review');
-        expect(guidance).toContain('## 한 줄 판단');
-        expect(guidance).toContain('## 잘된 점');
-        expect(guidance).toContain('## 부족한 점');
-        expect(guidance).toContain('## 사용자 관점 개선');
-        expect(guidance).toContain('## 다음 한 수');
+        // v2.2.64: review labels switched from markdown headers ("## 한 줄 판단") to plain-text
+        // numbered labels ("1) 한 줄 판단") so the model doesn't learn to emit `##` in the answer.
+        expect(guidance).toContain('1) 한 줄 판단');
+        expect(guidance).toContain('2) 잘된 점');
+        expect(guidance).toContain('3) 부족한 점');
+        expect(guidance).toContain('4) 사용자 관점 개선');
+        expect(guidance).toContain('5) 다음 한 수');
    });

    it('adds an Astra stance layer for opinionated project collaboration', () => {
@@ -22,8 +22,11 @@ describe('base system prompt', () => {
        const prompt = getSystemPrompt();

        expect(prompt).toContain('[STRICT GLOBAL RULES]');
-        expect(prompt).toContain('[OUTPUT FORMAT]');
-        expect(prompt).toContain('[FOLLOW-UP QUESTION RULES]');
-        expect(prompt).toContain('Ask ONE focused question at the very end');
+        // v2.2.68: [OUTPUT FORMAT] header now annotated with "— 7 hard rules";
+        // [FOLLOW-UP QUESTION RULES] is absorbed into R6 of the 7-rule block.
+        expect(prompt).toContain('[OUTPUT FORMAT — 7 hard rules]');
+        expect(prompt).toContain('R1. CONCLUSION FIRST');
+        expect(prompt).toContain('R6. ASK ONE QUESTION ONLY WHEN');
+        expect(prompt).toContain('R7. GUESS-AND-ACT WITH STATED ASSUMPTION');
    });
 });