"매 emphasize / de-emphasize specific tokens in a prompt — (word:1.3) syntax of Stable Diffusion, --w of Midjourney, attention scaling under the hood". AUTOMATIC1111 (2022) 의 prompt-weight syntax 가 community standard 로 자리잡음. 2026 currently FLUX, SD3.5, SDXL Turbo, Midjourney v7 모두 weighting 지원; T5-encoded models 는 syntax 가 다름.
매 핵심
매 syntax (Stable Diffusion / A1111 / ComfyUI)
(word) — weight ×1.1.
((word)) — ×1.21.
(word:1.3) — explicit weight ×1.3.
[word] — weight ÷1.1.
[word:0.5] — weight ×0.5.
(red hair:1.4) (blue eyes:0.8) — phrase-level.
매 syntax (Midjourney v7)
cat dog — equal weight.
cat::2 dog::1 — double-colon multi-prompt with weights.
--w 0.5 — image weight (text vs reference image).
--s 250 — stylize strength.
매 syntax (FLUX / T5-encoded)
T5 understands natural language; (word:1.3) syntax 매 mostly ignored.
Use emphasis via wording: "very prominent X", "subtle hint of Y".
Some forks (forge, ComfyUI) 매 still parse weights via re-prompting.
매 mechanism (under the hood)
CLIP/T5 text encoder → token embeddings.
A1111: weight w → multiply token embedding by w (post-encoding rescale).
Compel library: more sophisticated — interpolates between conditioning vectors.
Cross-attention scaling: alternative — scale K/V at attention layer.
매 best practices
Stay between 0.5 and 1.5; 매 above 1.5 → distortion / saturation.
Negative prompts often more effective than [word] syntax.
Long prompts: weight the critical 3-5 tokens, leave rest at 1.0.
For T5 models, use natural-language emphasis instead.
💻 패턴
diffusers + Compel (programmatic weighting)
fromdiffusersimportStableDiffusionXLPipelinefromcompelimportCompel,ReturnedEmbeddingsTypeimporttorchpipe=StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",torch_dtype=torch.float16).to("cuda")compel=Compel(tokenizer=[pipe.tokenizer,pipe.tokenizer_2],text_encoder=[pipe.text_encoder,pipe.text_encoder_2],returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,requires_pooled=[False,True],)prompt="a (red:1.4) sports car on a (sunny:0.7) beach, cinematic"conditioning,pooled=compel(prompt)image=pipe(prompt_embeds=conditioning,pooled_prompt_embeds=pooled).images[0]
A1111-style parsing (manual)
importredefparse_weighted(prompt):"""Return list of (text, weight) tuples."""out,depth_paren,depth_brack=[],0,0# Simplified: handles (text:1.3) onlypattern=re.compile(r"\(([^():]+):([\d.]+)\)")parts,last=[],0forminpattern.finditer(prompt):ifm.start()>last:parts.append((prompt[last:m.start()],1.0))parts.append((m.group(1),float(m.group(2))))last=m.end()iflast<len(prompt):parts.append((prompt[last:],1.0))returnparts
Cross-attention scaling (Hugging Face)
# Scale a specific token's attention by factorfromdiffusers.models.attention_processorimportAttnProcessorclassWeightedAttn(AttnProcessor):def__init__(self,token_idx,scale):self.token_idx,self.scale=token_idx,scaledef__call__(self,attn,hidden,encoder_hidden,attention_mask=None):# In encoder_hidden, multiply token_idx slot by scale before attnencoder_hidden=encoder_hidden.clone()encoder_hidden[:,self.token_idx]*=self.scalereturnsuper().__call__(attn,hidden,encoder_hidden,attention_mask)
# Bad (FLUX ignores): "(red hair:1.5) girl"# Good: "girl with strikingly vivid red hair, the red is the most prominent color in the image"
Prompt-blending (interpolate two prompts)
fromcompelimportCompelc1=compel("a cat in a forest")c2=compel("a robot in a city")mixed=(c1+c2)/2# Compel supports tensor arithmeticimage=pipe(prompt_embeds=mixed).images[0]
Step-conditional weighting ([from:to:step])
[cat:dog:0.5] in a field
# 0-50% steps: "cat", 50-100%: "dog"
# Useful for changing subject mid-denoising
매 결정 기준
상황
Approach
SDXL / SD1.5 / SD2.1
A1111 (word:1.3) syntax via Compel
FLUX / SD3.5 (T5)
Natural-language emphasis
Midjourney v7
::weight syntax
Subject + style mix
Multi-prompt with :: or compel blends
Subtle adjustment
0.8-1.2 range
Strong push
1.3-1.5; rarely above
Suppress concept
Negative prompt (preferred) over [word]
기본값: Compel for SDXL programmatic; A1111 syntax for casual; natural language for FLUX.