Files
2nd/10_Wiki/Topics/Coding/AI_Memory_Persistence_Deep.md
T
2026-05-10 22:08:15 +09:00

8.6 KiB
Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-memory-persistence-deep AI Memory Persistence — long-term / vector / graph Coding draft B conceptual 2026-05-09 2026-05-09
ai
memory
agents
vibe-coding
language applicable_to
TS / Python
AI
agent memory
long-term memory
episodic
semantic
mem0
Letta
memgpt

AI Memory Persistence

Agent 가 session 끝 = 잃음 = bad UX. Short-term (context) + long-term (vector / graph DB) + episodic (timeline). mem0 / Letta / MemGPT.

📖 핵심 개념

  • Short-term: context window (LLM input).
  • Long-term: persistent (DB).
  • Episodic: timeline (when X happened).
  • Semantic: 일반 fact ("user is vegan").

💻 코드 패턴

Naive (no memory)

async function chat(message: string) {
  return llm.complete({ prompt: message });
}

// 매 호출 = 새 — 이전 conversation 잃음.

Conversation history (단순)

const messages: Message[] = [];

async function chat(input: string) {
  messages.push({ role: 'user', content: input });
  const r = await llm.complete({ messages });
  messages.push({ role: 'assistant', content: r.text });
  return r.text;
}

→ Context window 한계. 100 message+ = 잃음.

Sliding window

function recent(messages: Message[], n: number = 20) {
  return messages.slice(-n);
}

await llm.complete({ messages: recent(allMessages) });

→ Old 잃음. Simple.

Summary cascade

async function getContext() {
  const all = await db.messages.find({});
  if (all.length < 20) return all;
  
  const old = all.slice(0, -10);
  const recent = all.slice(-10);
  
  const summary = await llm.complete({
    system: 'Summarize this conversation in 200 tokens.',
    messages: old,
  });
  
  return [
    { role: 'system', content: `Previous: ${summary}` },
    ...recent,
  ];
}

→ Old → summary. Recent intact.

Vector memory

// 매 message → embed → store
async function remember(message: Message) {
  const emb = await embed(message.content);
  await db.memories.insert({
    content: message.content,
    embedding: emb,
    timestamp: Date.now(),
    type: 'message',
  });
}

// Search
async function recall(query: string) {
  const emb = await embed(query);
  return db.memories.find({
    $vectorSearch: { embedding: emb, k: 5 },
  });
}

→ Topic 별 retrieval. Long history OK.

mem0 (managed)

from mem0 import Memory
m = Memory()

# Add
m.add("My favorite color is blue", user_id="alice")
m.add("I prefer dark mode", user_id="alice")

# Search
results = m.search(query="What does Alice like?", user_id="alice")
# → ['favorite color is blue', 'prefer dark mode']

→ 자동 extract + dedupe.

Letta (formerly MemGPT)

from letta import create_client
client = create_client()

agent = client.create_agent(
    name='Alice',
    memory={
        'human': 'User name: Bob.',
        'persona': 'I am a helpful assistant.',
    },
)

response = client.user_message(agent_id=agent.id, message='Hi')

→ "Memory editing" — agent 가 자기 memory 도 편집.

Memory hierarchy (Letta 의 idea)

Core memory: 항상 context (작은).
Recall memory: search 가능 (RAG).
Archival: 영구 cold storage.

→ Hot / cold tier.

Episodic memory (timeline)

interface Episode {
  id: string;
  timestamp: Date;
  summary: string;
  actors: string[];
  tags: string[];
  embedding: number[];
}

await db.episodes.insert({
  timestamp: new Date(),
  summary: 'User asked about pricing',
  actors: ['user', 'agent'],
  tags: ['pricing', 'sales'],
});

// "지난 주 price 토의 가 있었나?"
const r = await db.episodes.find({
  timestamp: { $gte: new Date(Date.now() - 7 * 86400000) },
  tags: 'pricing',
});

Semantic (fact)

interface Fact {
  subject: string;
  predicate: string;
  object: string;
}

const facts: Fact[] = [
  { subject: 'Alice', predicate: 'likes', object: 'blue' },
  { subject: 'Alice', predicate: 'works_at', object: 'Acme Corp' },
];

// Simple lookup
const aliceFacts = facts.filter(f => f.subject === 'Alice');

→ Knowledge graph 식.

Knowledge graph + LLM

// Neo4j / Memgraph
await graph.run(`
  MERGE (a:Person {name: 'Alice'})
  MERGE (b:Color {name: 'blue'})
  MERGE (a)-[:LIKES]->(b)
`);

await graph.run(`
  MATCH (a:Person {name: 'Alice'})-[:LIKES]->(c)
  RETURN c.name
`);
// ['blue']

Auto-extraction (LLM)

async function extractFacts(message: string) {
  const r = await llm.complete({
    system: 'Extract facts as {subject, predicate, object}. Only solid facts.',
    prompt: message,
  });
  return JSON.parse(r.text);
}

// "I'm Alice and I love coffee"
// → [{subject: 'user', predicate: 'is', object: 'Alice'}, {predicate: 'loves', object: 'coffee'}]

Forgetting (TTL / decay)

// 30 days 이전 = remove
await db.memories.deleteMany({ timestamp: { $lt: Date.now() - 30 * 86400000 } });

// 또는 importance score
interface Memory {
  importance: number;  // 1-10
  lastAccessed: Date;
}

// Recently accessed = 유지. 아니면 decay.

Memory consolidation (잠 식)

// 주기 — 매일 / 매 N message
async function consolidate(userId: string) {
  const recent = await db.memories.find({ userId, since: yesterday });
  
  // Cluster
  const clusters = cluster(recent);
  
  // Each cluster → summary
  for (const c of clusters) {
    const summary = await llm.complete({
      prompt: `Summarize these related memories: ${JSON.stringify(c)}`,
    });
    await db.consolidated.insert({ userId, summary, count: c.length });
    await db.memories.deleteMany({ id: { $in: c.map(x => x.id) } });
  }
}

→ 인간 의 sleep 식 — episode → semantic.

Working memory (current task)

class WorkingMemory {
  private items: any[] = [];
  
  add(item: any) {
    this.items.push(item);
    if (this.items.length > 7) this.items.shift();
  }
  
  context(): string {
    return this.items.map(i => JSON.stringify(i)).join('\n');
  }
}

→ 7 ± 2 인간 식.

Thread / session memory

// 매 thread 가 separate
const threadMemory = new Map<string, Message[]>();

async function chat(threadId: string, input: string) {
  const messages = threadMemory.get(threadId) ?? [];
  messages.push({ role: 'user', content: input });
  const r = await llm.complete({ messages });
  messages.push({ role: 'assistant', content: r.text });
  threadMemory.set(threadId, messages);
  return r.text;
}

Multi-user memory

// User 별 memory.
await db.memories.insert({
  userId: 'alice',
  content: '...',
  embedding: ...,
});

// Search 시 filter
const r = await db.memories.find({ userId: 'alice', $vectorSearch: ... });

→ Privacy: user A 의 memory 가 user B 에 leak X.

Privacy / GDPR

// 사용자 가 forget
async function forgetUser(userId: string) {
  await db.memories.deleteMany({ userId });
  await db.episodes.deleteMany({ userId });
}

Observability

@trace
async function recall(query: string) {
  const r = await db.memories.search(...);
  log({ query, retrieved: r.map(x => x.id), count: r.length });
  return r;
}

→ "Why agent forget?" debug.

Cost

Vector embed: $0.02 / 1M token.
Storage: $ / GB month.
Search: $ / 1k query.

작은 system: Postgres pgvector (cheap).
큰: Pinecone / Qdrant.

→ User 별 N memory × M user = 빠르게 큰.

함정

- Memory 너무 많음: noise / cost.
- Forgetting 없음: stale 정보.
- Auto-extract 의 hallucination.
- Memory 누설 (cross-user).
- Embed cost 무시.
- Long-term 가 context 불일치 (옛 fact).

Real-world

  • ChatGPT memory: 사용자 별 explicit memory.
  • Character.AI: persistent agent.
  • Letta: full memory framework.
  • mem0: memory-as-a-service.
  • Pi (Inflection): long memory companion.

🤔 의사결정 기준

상황 추천
단순 chat Sliding window
긴 history Summary cascade
Topic recall Vector memory
Personal facts Semantic / KG
Timeline query Episodic
Production ready mem0 / Letta
Multi-user User-scoped + privacy

안티패턴

  • 모든 history sent: cost / context.
  • No forget: stale 정보.
  • Cross-user leak: privacy.
  • Auto-extract 신뢰 100%: hallucinate.
  • No deduplication: same fact 여러 번.
  • No tracing: debug 불가.
  • Vector 만: 정확 fact 안 됨.

🤖 LLM 활용 힌트

  • Memory = short + long + episodic + semantic.
  • mem0 / Letta 가 production-ready.
  • Forgetting (TTL / consolidation) 가 인간 식.
  • Privacy (user-scoped) 필수.

🔗 관련 문서