--- id: wiki-2026-0508-refinement title: Refinement category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Iterative-Refinement, Self-Refine, Type-Refinement] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [llm, iteration, types, design] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: anthropic-sdk --- # Refinement ## 매 한 줄 > **"매 첫 시도는 draft, 매 refinement는 product"**. Refinement는 초안을 critique → revise loop로 다듬어 quality를 끌어올리는 patterns의 family — LLM self-refine, type narrowing, design iteration 모두 매 같은 핵심 idea를 공유. ## 매 핵심 ### 매 LLM Self-Refine (2024 Madaan et al. → 2026 mainstream) - **Generate → Critique → Refine** loop. Single model이 매 세 role을 모두 수행. - 매 효과: math, code, dialog 매 +5–20% accuracy without extra training. - **Reflexion** (Shinn 2023)은 verbal RL — 매 episode 끝에 매 self-critique를 episodic memory로 저장. - 2026 standard: Claude Opus 4.7 / GPT-5 매 native "extended thinking" mode가 매 refine을 internal로 흡수 → external loop는 매 high-stakes (legal, medical) 에서만. ### 매 Type Refinement (TypeScript / Flow / Python typing) - **Narrowing**: union type 의 매 instance를 specific subtype 으로 좁힘 (`typeof`, `instanceof`, discriminated union). - **Refinement type**: predicate-attached type — `{x: number | x > 0}`. Liquid Haskell, F* 매 사용. - 2026 TS 5.x: `satisfies` operator + control-flow analysis 매 강력 — 매 manual cast 의 거의 elimination. ### 매 Design / Spec Refinement - **Stepwise refinement** (Wirth 1971) — abstract spec → concrete implementation을 매 단계적으로. - **BDD** (Given-When-Then) 매 modern incarnation. - AI-aided: spec → Claude → multiple impl candidates → human picks → refine. ### 매 응용 1. RAG answer 의 self-refine으로 hallucination ↓. 2. Code generation 매 compile error → refine loop. 3. TS API 의 progressive type narrowing. 4. Product spec 의 PM ↔ AI iterative tightening. ## 💻 패턴 ### Self-Refine loop (Anthropic SDK) ```python from anthropic import Anthropic client = Anthropic() MODEL = "claude-opus-4-7" def self_refine(task: str, max_iter: int = 3) -> str: answer = client.messages.create( model=MODEL, max_tokens=2048, messages=[{"role": "user", "content": task}], ).content[0].text for i in range(max_iter): critique = client.messages.create( model=MODEL, max_tokens=1024, system="You are a strict critic. List concrete flaws or reply 'NO_ISSUES'.", messages=[{"role": "user", "content": f"Task: {task}\n\nDraft:\n{answer}"}], ).content[0].text if "NO_ISSUES" in critique: return answer answer = client.messages.create( model=MODEL, max_tokens=2048, messages=[{"role": "user", "content": f"Task: {task}\nDraft: {answer}\nCritique: {critique}\nRevise."}], ).content[0].text return answer ``` ### Reflexion-style episodic memory ```python class Reflexion: def __init__(self): self.memory: list[str] = [] # accumulated lessons def step(self, task: str) -> str: ctx = "\n".join(f"- {m}" for m in self.memory[-5:]) attempt = llm(f"Task: {task}\nPast lessons:\n{ctx}\nAct.") feedback = environment(attempt) if not feedback.success: lesson = llm(f"Why did this fail? Task: {task}\nAttempt: {attempt}\nFeedback: {feedback}") self.memory.append(lesson) return attempt ``` ### TypeScript discriminated union narrowing ```typescript type Result = | { kind: 'ok'; value: T } | { kind: 'err'; error: Error }; function unwrap(r: Result): T { if (r.kind === 'err') throw r.error; // narrow → 'ok' branch return r.value; // typed as T, no cast } ``` ### Python TypeGuard refinement ```python from typing import TypeGuard def is_str_list(x: list[object]) -> TypeGuard[list[str]]: return all(isinstance(i, str) for i in x) def join(items: list[object]) -> str: if is_str_list(items): return ", ".join(items) # narrowed raise TypeError("not str list") ``` ### Compile-error refine loop (code generation) ```python def codegen_refine(spec: str, max_iter=5): code = llm(f"Write Python for: {spec}") for _ in range(max_iter): ok, err = run_pytest(code) if ok: return code code = llm(f"Spec: {spec}\nCode: {code}\nFailing: {err}\nFix.") raise RuntimeError("refinement budget exhausted") ``` ### Best-of-N + judge (ensemble refinement) ```python def best_of_n(prompt: str, n: int = 5) -> str: candidates = [llm(prompt, temperature=1.0) for _ in range(n)] ranking = llm(f"Pick best of these:\n{candidates}\nReturn index 0..{n-1}") return candidates[int(ranking.strip())] ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Single-shot good enough (chat) | No refine — extra cost | | High-stakes (legal/medical) | External self-refine + human review | | Code with tests | Compile/test-driven refine | | Long agentic task | Reflexion (episodic memory) | | Reasoning math | extended thinking (native) — already refine internally | | TS API design | Narrowing + `satisfies` | **기본값**: native extended thinking 우선 → 부족하면 external self-refine 1–2 iter. ## 🔗 Graph - 부모: [[Iteration]] - 변형: [[Reflexion]] · [[Self-Consistency]] · [[Best-of-N]] - 응용: [[RAG]] · [[Code-Generation]] - Adjacent: [[TypeScript 타입 시스템 (TypeScript Type System)|Type-System]] · [[Stepwise-Refinement]] · [[BDD]] ## 🤖 LLM 활용 **언제**: high-stakes output, agentic loops, code with verifiable feedback. **언제 X**: 매 latency-sensitive UX, 매 simple chat — extra latency × cost는 매 안 맞음. ## ❌ 안티패턴 - **Infinite refine loop**: max_iter 의 hard cap 의 X → cost explosion. - **Same-model critique only**: 매 critic = generator인 경우 같은 blind spot. Mix models (Opus critic, Sonnet generator). - **Refine without termination signal**: "NO_ISSUES" 같은 매 explicit stop 의 부재 → 매 endless tweaking. - **Type assertion 으로 narrow**: TS 매 `as` 사용은 매 refinement 의 X — 매 unsafe cast. ## 🧪 검증 / 중복 - Verified (Madaan 2024 Self-Refine, Shinn 2023 Reflexion, TS handbook narrowing). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full rewrite covering LLM self-refine, type narrowing, design iteration |