Files
2nd/10_Wiki/Topics/AI_and_ML/모델 매개변수 제어 (Model Parameter Control).md
T
2026-05-10 22:08:15 +09:00

7.4 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-모델-매개변수-제어-model-parameter-contr 모델 매개변수 제어 (Model Parameter Control) 10_Wiki/Topics verified self
Model Parameter Control
Inference Parameters
Sampling Parameters
none A 0.9 applied
parameters
sampling
inference
llm
image-gen
2026-05-10 pending
language framework
python openai-anthropic-vllm-comfyui

모델 매개변수 제어 (Model Parameter Control)

매 한 줄

"매 parameter 는 model behavior 의 dial — temperature, top_p, top_k, seed 가 매 generation 의 character 를 결정". 2026년 production LLM 의 매 endpoint 가 노출하는 sampling knobs (Anthropic, OpenAI, vLLM, Ollama) + image gen 의 cfg/steps/scheduler — 매 정량적 control 의 핵심.

매 핵심

매 LLM sampling parameters

  • temperature [0, 2]: logit scaling. 0 = greedy, 1 = raw distribution, >1 = flatten. 매 deterministic task 는 0, creative 는 0.7~1.0.
  • top_p (nucleus) [0, 1]: cumulative prob mass. 0.9 = 매 top-90% mass tokens 만 sample.
  • top_k: 매 top-K logits 만 유지. vLLM 은 -1 = disabled.
  • min_p [0, 1]: relative threshold (vs top token prob). 매 modern alternative to top_p.
  • frequency_penalty [-2, 2] / presence_penalty: repetition control.
  • seed: reproducibility. 매 same seed + temperature=0 → deterministic (대부분).
  • stop: 매 stop strings. 매 agent loop 의 turn boundary 제어.
  • max_tokens / max_completion_tokens: output budget.

매 image gen parameters (FLUX, SD3.5, Midjourney)

  • cfg / guidance_scale: prompt adherence vs creativity. FLUX 3.55.0, SD 59.
  • steps: denoising steps. FLUX-dev 28, FLUX-schnell 4, SD3.5 28~40.
  • scheduler / sampler: euler, dpmpp_2m, etc. 매 quality/speed tradeoff.
  • seed: 매 reproducible composition.
  • denoising_strength (img2img): 0 = identical, 1 = ignore source.

매 응용

  1. RAG answer extraction → temperature=0, top_p=1.
  2. Brainstorm → temperature=0.9, presence_penalty=0.6.
  3. Code completion → temperature=0.2, stop=["\n\n"].
  4. Image variation → 매 seed fix + cfg lower.

💻 패턴

Anthropic Claude — deterministic extraction

from anthropic import Anthropic

client = Anthropic()
resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    temperature=0.0,  # deterministic
    top_p=1.0,
    system="Extract structured data. Output JSON only.",
    messages=[{"role": "user", "content": doc_text}],
)

OpenAI GPT-5 — creative writing knobs

from openai import OpenAI

client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-5",
    temperature=0.9,
    top_p=0.95,
    presence_penalty=0.6,
    frequency_penalty=0.3,
    max_completion_tokens=2000,
    seed=42,  # best-effort reproducibility
    messages=[{"role": "user", "content": "Write a noir opening."}],
)

vLLM — full sampling control (self-host Llama 3.3)

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3.3-70B-Instruct", tensor_parallel_size=4)

params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    top_k=50,
    min_p=0.05,           # modern alternative
    repetition_penalty=1.1,
    max_tokens=512,
    stop=["</answer>"],
    seed=2026,
    logprobs=5,           # debugging
)
outputs = llm.generate(["Explain mixture-of-experts."], params)

MLX (Apple Silicon) — local inference with seed

from mlx_lm import load, generate
import mlx.core as mx

model, tok = load("mlx-community/Llama-3.3-70B-Instruct-4bit")
mx.random.seed(42)
text = generate(
    model, tok,
    prompt="Summarize:",
    max_tokens=256,
    temp=0.3,
    top_p=0.9,
    verbose=False,
)

FLUX.1-dev via diffusers — image gen knobs

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

img = pipe(
    prompt="cinematic neo-tokyo alley, neon, rain",
    guidance_scale=3.5,           # FLUX prefers low CFG
    num_inference_steps=28,
    generator=torch.Generator("cuda").manual_seed(42),
    width=1024, height=1024,
).images[0]

ComfyUI API — programmatic SD3.5 with full control

import json, requests

workflow = {
    "sampler": {
        "class_type": "KSampler",
        "inputs": {
            "seed": 42, "steps": 30, "cfg": 7.0,
            "sampler_name": "dpmpp_2m", "scheduler": "karras",
            "denoise": 1.0,
            "model": ["loader", 0],
            "positive": ["pos_clip", 0],
            "negative": ["neg_clip", 0],
            "latent_image": ["empty_latent", 0],
        },
    },
    # ... rest of graph
}
r = requests.post("http://localhost:8188/prompt", json={"prompt": workflow})

Sweep parameters with Optuna for prompt+param tuning

import optuna
from anthropic import Anthropic

client = Anthropic()
EVAL_SET = load_eval()  # list[(prompt, expected)]

def objective(trial):
    temp = trial.suggest_float("temperature", 0.0, 1.2)
    tp   = trial.suggest_float("top_p", 0.5, 1.0)
    score = 0
    for q, exp in EVAL_SET:
        out = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=512, temperature=temp, top_p=tp,
            messages=[{"role": "user", "content": q}],
        ).content[0].text
        score += grade(out, exp)
    return score / len(EVAL_SET)

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=40)
print(study.best_params)

매 결정 기준

Task temperature top_p 기타
Extraction / classification 0.0 1.0 seed 고정
Code completion 0.2 0.95 stop tokens
Summarization 0.3 0.9
Q&A (RAG) 0.0~0.3 1.0
Brainstorming 0.8~1.0 0.95 presence_penalty 0.6
Creative fiction 0.9~1.1 0.95 frequency_penalty 0.3
FLUX image cfg 3.5 steps 28 bf16
SD3.5 image cfg 7.0 steps 30 dpmpp_2m karras

기본값: temperature=0.7, top_p=0.9, seed=42 (debugging), max_tokens=task-budgeted.

🔗 Graph

🤖 LLM 활용

언제: 매 deterministic 결과 필요 (RAG, extraction) — temp=0. 매 creative output — temp 0.7+. 매 reproduce bug — seed 고정. 언제 X: 매 model 마다 seed 의 strict determinism 보장 X (특히 multi-GPU). 매 production 에서 seed 의존 X.

안티패턴

  • temperature=0 + top_p<1: 매 redundant (greedy 가 이미 top-1).
  • temperature 1.5+ in production: 매 hallucination/incoherence spike.
  • seed 만 고정 + temperature 0.7: 매 batched inference 에서 비결정적.
  • max_tokens=4096 default: 매 cost blowup. Task-budgeted.
  • frequency_penalty 1.5+: 매 vocabulary collapse.

🧪 검증 / 중복

  • Verified (Anthropic Messages API, OpenAI Chat Completions, vLLM SamplingParams, diffusers FluxPipeline, Stability SD3.5 docs, ComfyUI API).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — LLM/image sampling params + 7 working patterns