"매 engagement 의 depth 측정". CTR / time-on-page 같은 surface metric 만 보면 clickbait 의 reward → 매 deeper signal (scroll completion, return visit, comment quality, downstream conversion) 의 measure 의 필요. 2026 의 LLM-as-judge 의 quality scoring 의 mainstream.
매 핵심
매 surface vs depth metric
Surface: CTR, time-on-page, bounce rate, like count.
fromanthropicimportAnthropicclient=Anthropic()defscore_content(text:str)->dict:resp=client.messages.create(model="claude-opus-4-7",max_tokens=512,messages=[{"role":"user","content":f"""Rate the following article on 4 axes (0-10 each):
- coherence (logical flow)
- density (info per paragraph)
- originality (vs generic LLM output)
- actionability (reader takeaway)
Return strict JSON: {{"coherence": N, "density": N, "originality": N, "actionability": N, "rationale": "..."}}ARTICLE:
{text[:8000]}"""}])importjsonreturnjson.loads(resp.content[0].text)
Composite depth score
importnumpyasnpdefdepth_score(metrics:dict)->float:# weights tuned on labeled training setw={'scroll_completion':0.15,'focused_dwell_ratio':0.25,'return_within_7d':0.20,'downstream_action':0.25,'share_with_comment':0.15,}returnsum(w[k]*metrics.get(k,0)forkinw)
Clickbait detector heuristic
defclickbait_signal(row):# high CTR + low depth = clickbaitifrow['ctr']>0.10androw['depth_score']<0.3:return1.0return0.0
언제: 매 content recommendation 의 reranking signal, KB article quality gate, AB test 의 secondary metric.
언제 X: 매 small sample (variance 너무 큼), 매 acquisition-stage funnel (CTR primary).
❌ 안티패턴
Single metric optimization: Goodhart — 매 CTR alone optimize 하면 clickbait.
LLM judge 의 prompt drift: 매 pinned model + temperature 0 + version log 의 필수.
Depth metric 의 latency: return-visit 7d → 매 delayed feedback. 매 surrogate (focused dwell) 도 함께.
🧪 검증 / 중복
Verified (Goodhart 1975; Zheng et al. 2023 LLM-as-judge; YouTube Watch Time → "Valued Watch Time" pivot ~2017).