[G1-Sync] Manual knowledge update

2026-05-09 21:08:02 +09:00
parent f0befc887a
commit 93ec7e9056
363 changed files with 68333 additions and 64 deletions
@@ -0,0 +1,185 @@
+---
+id: ai-streaming-llm-response
+title: LLM Streaming — SSE / 토큰 단위 / 취소
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [ai, llm, streaming, sse, vibe-coding]
+tech_stack: { language: "TS / Node / OpenAI / Anthropic", applicable_to: ["Backend", "Frontend"] }
+applied_in: []
+aliases: [token streaming, SSE, AbortController, partial JSON, server-sent events]
+---
+
+# LLM Streaming
+
+> 5초 기다리지 마, **토큰 한 개씩 흘려라**. SSE 또는 fetch streams. 취소 = AbortController. JSON 도 partial parse 가능. UX 가 5배 좋아진다.
+
+## 📖 핵심 개념
+- 모델이 토큰 N개 출력 = stream 으로 받음.
+- SSE: text/event-stream — 단방향, 자동 reconnect.
+- AbortController: 사용자 취소 → 서버 token 절약.
+- Partial JSON: 미완성 JSON 도 안전하게 parse (best-effort).
+
+## 💻 코드 패턴
+
+### Server (Node) — OpenAI
+```ts
+import OpenAI from 'openai';
+
+const client = new OpenAI();
+
+app.post('/api/chat', async (req, res) => {
+  res.writeHead(200, {
+    'Content-Type': 'text/event-stream',
+    'Cache-Control': 'no-cache, no-transform',
+    'Connection': 'keep-alive',
+    'X-Accel-Buffering': 'no', // nginx buffer 끄기
+  });
+
+  const stream = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: req.body.messages,
+    stream: true,
+  });
+
+  req.on('close', () => stream.controller.abort()); // client 끊으면 LLM 도 cancel
+
+  for await (const chunk of stream) {
+    const delta = chunk.choices[0]?.delta?.content ?? '';
+    if (delta) res.write(`data: ${JSON.stringify({ delta })}\n\n`);
+  }
+  res.write('data: [DONE]\n\n');
+  res.end();
+});
+```
+
+### Anthropic
+```ts
+const stream = await anthropic.messages.stream({
+  model: 'claude-opus-4-7',
+  max_tokens: 1024,
+  messages: req.body.messages,
+});
+
+for await (const ev of stream) {
+  if (ev.type === 'content_block_delta' && ev.delta.type === 'text_delta') {
+    res.write(`data: ${JSON.stringify({ delta: ev.delta.text })}\n\n`);
+  }
+}
+```
+
+### Client — fetch streams
+```ts
+const ac = new AbortController();
+const res = await fetch('/api/chat', {
+  method: 'POST',
+  body: JSON.stringify({ messages }),
+  signal: ac.signal,
+});
+
+const reader = res.body!.getReader();
+const decoder = new TextDecoder();
+let buf = '';
+let answer = '';
+
+while (true) {
+  const { value, done } = await reader.read();
+  if (done) break;
+  buf += decoder.decode(value, { stream: true });
+
+  const lines = buf.split('\n\n');
+  buf = lines.pop() ?? '';
+  for (const line of lines) {
+    if (!line.startsWith('data: ')) continue;
+    const data = line.slice(6);
+    if (data === '[DONE]') return;
+    answer += JSON.parse(data).delta;
+    setAnswer(answer);
+  }
+}
+
+// 사용자 취소
+abortBtn.onclick = () => ac.abort();
+```
+
+### Vercel AI SDK (간단)
+```ts
+// server
+import { streamText } from 'ai';
+import { openai } from '@ai-sdk/openai';
+
+export async function POST(req: Request) {
+  const { messages } = await req.json();
+  const result = streamText({ model: openai('gpt-4o'), messages });
+  return result.toDataStreamResponse();
+}
+
+// client
+import { useChat } from 'ai/react';
+const { messages, input, handleSubmit, stop } = useChat();
+```
+
+### Partial JSON parse
+```ts
+import { parse } from 'partial-json';
+
+let buf = '';
+for await (const delta of stream) {
+  buf += delta;
+  try {
+    const partial = parse(buf, { allow: 'all' });
+    setData(partial); // 미완성 객체도 표시
+  } catch { /* 더 받자 */ }
+}
+```
+
+### 토큰 카운트 + cost (마지막)
+```ts
+for await (const chunk of stream) {
+  if (chunk.usage) {
+    track('llm.usage', chunk.usage); // 마지막 chunk
+  }
+}
+```
+
+### Backpressure (느린 client)
+```ts
+for await (const chunk of stream) {
+  const ok = res.write(line);
+  if (!ok) {
+    await new Promise(r => res.once('drain', r));
+  }
+}
+```
+
+## 🤔 의사결정 기준
+| 상황 | 추천 |
+|---|---|
+| Next.js | Vercel AI SDK |
+| Node Express / Hono | OpenAI/Anthropic SDK + SSE |
+| Mobile (RN) | RN-event-source 또는 fetch streams |
+| 비-text 결과 (JSON tool) | Anthropic streaming + partial-json |
+| 여러 LLM swap | LangChain.js / Vercel AI SDK |
+| 매우 짧은 응답 | 비-stream 으로 충분 |
+
+## ❌ 안티패턴
+- **Client cancel → server keep generating**: 토큰 낭비. AbortController 필수.
+- **Buffer 큰 chunk**: nginx X-Accel-Buffering: no.
+- **Markdown 미완성 표시**: 스트리밍 중 ``` 만 있어도 보임. 후처리.
+- **JSON.parse delta**: 미완성. partial-json.
+- **LLM error 무시**: 도중에 끊김 — 사용자에 알림.
+- **Token count 대화당 매번**: 마지막 chunk usage 사용.
+- **WebSocket 으로 LLM**: SSE 충분, WS 는 양방향이 필요할 때만.
+
+## 🤖 LLM 활용 힌트
+- Server: SSE + AbortController on close.
+- Client: fetch streams + decoder + buffer split.
+- Vercel AI SDK 가 모든 boilerplate 추상화.
+
+## 🔗 관련 문서
+- [[Backend_SSE_Server_Sent_Events]]
+- [[AI_Structured_Output_Zod]]
+- [[AI_RAG_Pattern_Basics]]