--- id: wiki-2026-0508-모델-매개변수-제어-model-parameter-contr title: 모델 매개변수 제어 (Model Parameter Control) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Model Parameter Control, Inference Parameters, Sampling Parameters] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [parameters, sampling, inference, llm, image-gen] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: openai-anthropic-vllm-comfyui --- # 모델 매개변수 제어 (Model Parameter Control) ## 매 한 줄 > **"매 parameter 는 model behavior 의 dial — temperature, top_p, top_k, seed 가 매 generation 의 character 를 결정"**. 2026년 production LLM 의 매 endpoint 가 노출하는 sampling knobs (Anthropic, OpenAI, vLLM, Ollama) + image gen 의 cfg/steps/scheduler — 매 정량적 control 의 핵심. ## 매 핵심 ### 매 LLM sampling parameters - **temperature** [0, 2]: logit scaling. 0 = greedy, 1 = raw distribution, >1 = flatten. 매 deterministic task 는 0, creative 는 0.7~1.0. - **top_p** (nucleus) [0, 1]: cumulative prob mass. 0.9 = 매 top-90% mass tokens 만 sample. - **top_k**: 매 top-K logits 만 유지. vLLM 은 -1 = disabled. - **min_p** [0, 1]: relative threshold (vs top token prob). 매 modern alternative to top_p. - **frequency_penalty** [-2, 2] / **presence_penalty**: repetition control. - **seed**: reproducibility. 매 same seed + temperature=0 → deterministic (대부분). - **stop**: 매 stop strings. 매 agent loop 의 turn boundary 제어. - **max_tokens** / **max_completion_tokens**: output budget. ### 매 image gen parameters (FLUX, SD3.5, Midjourney) - **cfg / guidance_scale**: prompt adherence vs creativity. FLUX 3.5~5.0, SD 5~9. - **steps**: denoising steps. FLUX-dev 28, FLUX-schnell 4, SD3.5 28~40. - **scheduler / sampler**: euler, dpmpp_2m, etc. 매 quality/speed tradeoff. - **seed**: 매 reproducible composition. - **denoising_strength** (img2img): 0 = identical, 1 = ignore source. ### 매 응용 1. RAG answer extraction → temperature=0, top_p=1. 2. Brainstorm → temperature=0.9, presence_penalty=0.6. 3. Code completion → temperature=0.2, stop=["\n\n"]. 4. Image variation → 매 seed fix + cfg lower. ## 💻 패턴 ### Anthropic Claude — deterministic extraction ```python from anthropic import Anthropic client = Anthropic() resp = client.messages.create( model="claude-opus-4-7", max_tokens=1024, temperature=0.0, # deterministic top_p=1.0, system="Extract structured data. Output JSON only.", messages=[{"role": "user", "content": doc_text}], ) ``` ### OpenAI GPT-5 — creative writing knobs ```python from openai import OpenAI client = OpenAI() resp = client.chat.completions.create( model="gpt-5", temperature=0.9, top_p=0.95, presence_penalty=0.6, frequency_penalty=0.3, max_completion_tokens=2000, seed=42, # best-effort reproducibility messages=[{"role": "user", "content": "Write a noir opening."}], ) ``` ### vLLM — full sampling control (self-host Llama 3.3) ```python from vllm import LLM, SamplingParams llm = LLM(model="meta-llama/Llama-3.3-70B-Instruct", tensor_parallel_size=4) params = SamplingParams( temperature=0.7, top_p=0.9, top_k=50, min_p=0.05, # modern alternative repetition_penalty=1.1, max_tokens=512, stop=[""], seed=2026, logprobs=5, # debugging ) outputs = llm.generate(["Explain mixture-of-experts."], params) ``` ### MLX (Apple Silicon) — local inference with seed ```python from mlx_lm import load, generate import mlx.core as mx model, tok = load("mlx-community/Llama-3.3-70B-Instruct-4bit") mx.random.seed(42) text = generate( model, tok, prompt="Summarize:", max_tokens=256, temp=0.3, top_p=0.9, verbose=False, ) ``` ### FLUX.1-dev via diffusers — image gen knobs ```python import torch from diffusers import FluxPipeline pipe = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, ).to("cuda") img = pipe( prompt="cinematic neo-tokyo alley, neon, rain", guidance_scale=3.5, # FLUX prefers low CFG num_inference_steps=28, generator=torch.Generator("cuda").manual_seed(42), width=1024, height=1024, ).images[0] ``` ### ComfyUI API — programmatic SD3.5 with full control ```python import json, requests workflow = { "sampler": { "class_type": "KSampler", "inputs": { "seed": 42, "steps": 30, "cfg": 7.0, "sampler_name": "dpmpp_2m", "scheduler": "karras", "denoise": 1.0, "model": ["loader", 0], "positive": ["pos_clip", 0], "negative": ["neg_clip", 0], "latent_image": ["empty_latent", 0], }, }, # ... rest of graph } r = requests.post("http://localhost:8188/prompt", json={"prompt": workflow}) ``` ### Sweep parameters with Optuna for prompt+param tuning ```python import optuna from anthropic import Anthropic client = Anthropic() EVAL_SET = load_eval() # list[(prompt, expected)] def objective(trial): temp = trial.suggest_float("temperature", 0.0, 1.2) tp = trial.suggest_float("top_p", 0.5, 1.0) score = 0 for q, exp in EVAL_SET: out = client.messages.create( model="claude-opus-4-7", max_tokens=512, temperature=temp, top_p=tp, messages=[{"role": "user", "content": q}], ).content[0].text score += grade(out, exp) return score / len(EVAL_SET) study = optuna.create_study(direction="maximize") study.optimize(objective, n_trials=40) print(study.best_params) ``` ## 매 결정 기준 | Task | temperature | top_p | 기타 | |---|---|---|---| | Extraction / classification | 0.0 | 1.0 | seed 고정 | | Code completion | 0.2 | 0.95 | stop tokens | | Summarization | 0.3 | 0.9 | — | | Q&A (RAG) | 0.0~0.3 | 1.0 | — | | Brainstorming | 0.8~1.0 | 0.95 | presence_penalty 0.6 | | Creative fiction | 0.9~1.1 | 0.95 | frequency_penalty 0.3 | | FLUX image | cfg 3.5 | steps 28 | bf16 | | SD3.5 image | cfg 7.0 | steps 30 | dpmpp_2m karras | **기본값**: temperature=0.7, top_p=0.9, seed=42 (debugging), max_tokens=task-budgeted. ## 🔗 Graph - 부모: [[Parameter]] - 변형: [[Sampling_Strategies]] - 응용: [[Iterative Prompting]] · [[Midjourney]] · [[RAG]] - Adjacent: [[Prompt_Engineering]] ## 🤖 LLM 활용 **언제**: 매 deterministic 결과 필요 (RAG, extraction) — temp=0. 매 creative output — temp 0.7+. 매 reproduce bug — seed 고정. **언제 X**: 매 model 마다 seed 의 strict determinism 보장 X (특히 multi-GPU). 매 production 에서 seed 의존 X. ## ❌ 안티패턴 - **temperature=0 + top_p<1**: 매 redundant (greedy 가 이미 top-1). - **temperature 1.5+ in production**: 매 hallucination/incoherence spike. - **seed 만 고정 + temperature 0.7**: 매 batched inference 에서 비결정적. - **max_tokens=4096 default**: 매 cost blowup. Task-budgeted. - **frequency_penalty 1.5+**: 매 vocabulary collapse. ## 🧪 검증 / 중복 - Verified (Anthropic Messages API, OpenAI Chat Completions, vLLM SamplingParams, diffusers FluxPipeline, Stability SD3.5 docs, ComfyUI API). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — LLM/image sampling params + 7 working patterns |