Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

5.6 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Test Time Compute Scaling (추론 시간 계산 스케일링)

매 한 줄

"매 think longer, get smarter". Test-time compute scaling 매 inference 시 더 많은 compute (매 longer chain-of-thought, sampling, search) 로 quality 의 trade off. OpenAI o1 (2024-09) → o3 / DeepSeek-R1 (2025-01) → Claude 4.x extended thinking (2025+) 의 paradigm. 매 training-time scaling laws 의 보완.

매 핵심

매 두 axes

More thinking (long CoT) — 매 single sample 안 더 긴 reasoning trace. o1, R1, Claude extended thinking.
Search / sampling — 매 multiple samples + verifier (best-of-N, MCTS, beam). AlphaCode, ReST, MathShepherd.

매 modern (2025-2026)

RL on reasoning — 매 RLHF + RL on verifiable rewards (math, code) → 매 long CoT 의 emerge. R1-zero, R1.
Extended thinking budgets — 매 Claude 의 thinking_budget parameter, OpenAI 의 reasoning_effort.
Scaling law — 매 log compute ↔ accuracy linear (Snell 2024, OpenAI o-series chart).
Cost shift — 매 training 1x 의 inference Nx — 매 economics 의 reshape.

매 응용

Math (AIME, IMO).
Code (SWE-bench, competition).
Agentic planning (deep tool-use chains).
Scientific reasoning (GPQA).

💻 패턴

Claude extended thinking

from anthropic import Anthropic
client = Anthropic()

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8000,
    thinking={"type": "enabled", "budget_tokens": 16000},
    messages=[{"role": "user", "content": "Solve: ..."}],
)
for block in resp.content:
    if block.type == "thinking":
        print("THINK:", block.thinking[:200])
    elif block.type == "text":
        print("ANS:", block.text)

OpenAI reasoning effort

from openai import OpenAI
client = OpenAI()
resp = client.responses.create(
    model="o3",
    input="Prove the AM-GM inequality.",
    reasoning={"effort": "high"},   # low / medium / high
)
print(resp.output_text)

Best-of-N + verifier

def best_of_n(prompt, n=8, verifier=None):
    samples = [client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2000,
        temperature=0.8,
        messages=[{"role": "user", "content": prompt}],
    ).content[0].text for _ in range(n)]
    return max(samples, key=verifier)  # 매 verifier: unit test pass count, etc.

Self-consistency (majority vote)

from collections import Counter
answers = [extract_answer(s) for s in samples]
final = Counter(answers).most_common(1)[0][0]

MCTS-style search (sketch)

def expand(node):
    children = [llm.continue_from(node.partial, temp=0.9) for _ in range(k)]
    return [Node(c, score=verifier(c)) for c in children]

def search(root, depth=4):
    frontier = [root]
    for _ in range(depth):
        candidates = sum((expand(n) for n in frontier), [])
        frontier = sorted(candidates, key=lambda n: -n.score)[:beam]
    return max(frontier, key=lambda n: n.score)

Budget controller

def adaptive_thinking(prompt, easy_budget=2000, hard_budget=32000):
    # 매 difficulty classifier 의 first
    diff = client.messages.create(model="claude-haiku-4", ...).content[0].text
    budget = hard_budget if "hard" in diff else easy_budget
    return client.messages.create(
        model="claude-opus-4-7",
        thinking={"type": "enabled", "budget_tokens": budget},
        messages=[{"role": "user", "content": prompt}],
    )

매 결정 기준

상황	Approach
Math / code with verifier	RL-trained reasoning model (o3, R1) + search
Open-ended reasoning	Extended thinking (Claude 4.x)
Latency-critical	Skip — use small fast model
Cost-critical batch	Self-consistency 4-8 samples
Search exploitable	Best-of-N + verifier
Fuzzy quality	Reasoning model > base model

기본값: 매 reasoning model (o3 / Claude extended thinking) 매 hard task, base model 매 easy task — 매 difficulty router 로 split.

🔗 Graph

부모: Scaling-Laws
변형: Chain-of-Thought · Self-Consistency · Best-of-N · MCTS
응용: Code-Generation
Adjacent: RLHF

🤖 LLM 활용

언제: 매 hard reasoning task, verifiable output (math/code), agent planning, quality > latency. 언제 X: 매 simple lookup / chat — 매 thinking 매 cost waste.

❌ 안티패턴

Always max thinking budget: 매 easy task 의 32k thinking 매 cost burn — 매 router 사용.
No verifier in best-of-N: 매 random sample 매 noise — 매 verifier (unit test, math check) 의 essential.
Stream thinking to user: 매 thinking content 매 internal — 매 user UI 에 final text 만.
Caching invalidation: 매 thinking budget 변경 시 cache miss — 매 stable budget 권장.

🧪 검증 / 중복

Verified (OpenAI o1/o3 system cards, DeepSeek-R1 paper 2025-01, Anthropic extended thinking docs, Snell et al. 2024).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — o-series / R1 / Claude extended thinking patterns

5.6 KiB Raw Blame History