Files
2nd/10_Wiki/Topics/Coding/AI_Memory_Systems.md
T
2026-05-09 21:08:02 +09:00

8.1 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-memory-systems AI Memory Systems — Short / Long / Episodic Coding draft B conceptual 2026-05-09 2026-05-09
ai
memory
vibe-coding
language applicable_to
TS / Python
Backend
agent memory
mem0
conversation memory
vector memory
summarization

AI Memory Systems

LLM context 제한 → memory system 으로 우회. Short-term (conversation), Long-term (vector DB), Episodic (event log), Semantic (facts). mem0 / Letta / 자체.

📖 핵심 개념

  • Short-term: 대화 안 messages.
  • Long-term: 사용자 별 영속 memory.
  • Episodic: 시간 순 event.
  • Semantic: 사실 / preference (refined).

💻 코드 패턴

Short-term (sliding window)

class ConversationMemory {
  private messages: Message[] = [];
  private maxTokens = 8000;
  
  add(msg: Message) {
    this.messages.push(msg);
    this.trim();
  }
  
  private trim() {
    while (this.tokenCount() > this.maxTokens && this.messages.length > 2) {
      this.messages.splice(1, 1);  // system 제외, 가장 오래된
    }
  }
  
  private tokenCount(): number {
    return this.messages.reduce((s, m) => s + countTokens(m.content), 0);
  }
}

Summarization (오래된 message 압축)

async function summarizeOld(messages: Message[]): Promise<Message[]> {
  if (messages.length < 20) return messages;
  
  const old = messages.slice(0, -10);
  const recent = messages.slice(-10);
  
  const summary = await llm.complete({
    system: 'Summarize this conversation in 200 words.',
    user: old.map(m => `${m.role}: ${m.content}`).join('\n'),
  });
  
  return [
    { role: 'system', content: `Conversation summary:\n${summary}` },
    ...recent,
  ];
}

Long-term — vector memory

class VectorMemory {
  constructor(private userId: string, private vectorDB: VectorDB) {}
  
  async add(content: string, metadata?: Record<string, any>) {
    const embedding = await embed(content);
    await this.vectorDB.upsert({
      userId: this.userId,
      content,
      embedding,
      metadata,
      createdAt: new Date(),
    });
  }
  
  async retrieve(query: string, k = 5): Promise<string[]> {
    const queryEmb = await embed(query);
    const results = await this.vectorDB.search({
      userId: this.userId,
      embedding: queryEmb,
      limit: k,
    });
    return results.map(r => r.content);
  }
}

// Agent 안 사용
const memory = new VectorMemory(userId, vectorDB);

async function chat(userMsg: string) {
  const relevant = await memory.retrieve(userMsg);
  const system = `You are a helpful assistant.\n\nRelevant context about this user:\n${relevant.join('\n')}`;
  
  const r = await llm.chat({ system, messages });
  
  // Save important facts
  if (r.text.includes('I like') || r.text.includes('I prefer')) {
    await memory.add(userMsg);
  }
  
  return r;
}

mem0 (managed memory)

import { MemoryClient } from '@mem0/sdk';

const m = new MemoryClient({ apiKey });

// Add — auto extract facts
await m.add(
  [
    { role: 'user', content: 'I love hiking and prefer Korean food' },
    { role: 'assistant', content: '...' },
  ],
  { user_id: 'u1' }
);

// Retrieve
const memories = await m.search('What does the user like?', { user_id: 'u1' });
// [{ memory: 'Loves hiking', score: 0.9 }, ...]

→ Auto extraction + storage + retrieval.

Letta (formerly MemGPT)

from letta_client import Letta

client = Letta()
agent = client.agents.create(
    name='assistant',
    memory=BasicBlockMemory(blocks=[
        Block(label='persona', value='I am a helpful assistant.'),
        Block(label='human', value='User name: Alice'),
    ]),
)

# Agent 가 자체 memory 관리 — block 추가 / 수정
response = agent.send_message('My favorite color is blue')
# Internally: agent updates 'human' block with 'favorite color: blue'

→ Self-editing memory.

Episodic (event log)

CREATE TABLE agent_events (
  id BIGSERIAL PRIMARY KEY,
  user_id UUID,
  event_type TEXT,
  payload JSONB,
  occurred_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX agent_events_user ON agent_events(user_id, occurred_at DESC);
async function recordEvent(userId: string, type: string, data: any) {
  await db.agentEvents.insert({ userId, eventType: type, payload: data });
}

async function recentEvents(userId: string, limit = 20) {
  return db.agentEvents.findMany({ where: { userId }, orderBy: { occurredAt: 'desc' }, take: limit });
}

Semantic memory (facts)

// Fact extraction (LLM)
async function extractFacts(text: string): Promise<Fact[]> {
  const r = await llm.complete({
    system: 'Extract durable facts about the user. Output JSON: { facts: ["...", "..."] }',
    user: text,
    response_format: { type: 'json_object' },
  });
  return JSON.parse(r).facts;
}

// Save
const facts = await extractFacts(userMsg);
for (const fact of facts) {
  await memory.add(fact, { type: 'fact' });
}

Memory hierarchy

1. Working memory (LLM context window): 최근 N messages
2. Recent memory: 마지막 일주일 (DB query)
3. Long-term: vector DB (관련성)
4. Knowledge base: 일반 문서 (RAG)

→ Query 시 모든 layer retrieve + 합치기.

Forgetting (decay)

// 시간 weighted 또는 사용 빈도
async function retrieve(query: string): Promise<Memory[]> {
  const all = await vectorDB.search(query, 50);
  const now = Date.now();
  
  return all
    .map(m => ({
      ...m,
      score: m.similarity * Math.exp(-(now - m.createdAt) / DECAY_TIME) * (1 + m.accessCount * 0.1),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, 5);
}

→ 옛 + 자주 안 본 = 점수 낮음.

Memory consolidation (background)

// 주기적 — vector memory 정리
async function consolidate(userId: string) {
  const all = await memory.allForUser(userId);
  
  // 유사 memory 합치기
  for (const cluster of clusterBySimilarity(all, 0.9)) {
    if (cluster.length > 1) {
      const merged = await llm.complete({
        system: 'Merge these similar memories into one concise statement.',
        user: cluster.map(m => m.content).join('\n'),
      });
      await memory.replace(cluster.map(m => m.id), merged);
    }
  }
  
  // 오래되고 unused = delete
  await memory.deleteOldUnused(userId, 90);  // 90일+ + 사용 X
}

User-level vs session-level

User-level: 영구 — preferences, facts.
Session-level: 한 대화 — context.

→ 둘 다 필요. 분리.

Privacy / GDPR

// Delete memory on request
async function forgetUser(userId: string) {
  await memory.deleteAll(userId);
  await db.agentEvents.deleteAll({ userId });
}

// PII filter
async function add(content: string) {
  if (containsPII(content)) {
    content = redactPII(content);
  }
  await vectorDB.upsert(...);
}
Skills = 재사용 가능 instruction + tools 묶음.
한 번 정의 → 여러 conversation 에 inject.
// Filesystem-based skill
// .claude/skills/code-review/SKILL.md — instruction
// .claude/skills/code-review/scripts/ — supporting

// Auto-inject when relevant trigger.

🤔 의사결정 기준

상황 추천
단순 chatbot Sliding window (no memory)
사용자 preference Vector + summarize
매우 긴 대화 Letta / MemGPT
빠른 시작 mem0 (managed)
Self-host pgvector + 자체
Multi-user User scoped + privacy
Production mem0 / Zep

안티패턴

  • 무한 conversation = full context: token 폭발. summarize / sliding.
  • Vector + 모든 거 search: noise. metadata filter.
  • PII 그대로 저장: GDPR 위반.
  • Forgetting 없음: stale 데이터 쌓임.
  • User scope 없음: cross-user leak.
  • Memory 가 RAG 대체 가정: 다른 use — 둘 다.
  • Summary 없는 long conversation: 매번 모든 history.

🤖 LLM 활용 힌트

  • 4 layer (working / recent / long-term / knowledge).
  • Vector + summarize + decay 3종.
  • mem0 / Letta 가 빠른 시작.
  • Privacy / GDPR 시작부터.

🔗 관련 문서