--- id: ai-memory-persistence-deep title: AI Memory Persistence — long-term / vector / graph category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [ai, memory, agents, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["AI"] } applied_in: [] aliases: [agent memory, long-term memory, episodic, semantic, mem0, Letta, memgpt] --- # AI Memory Persistence > Agent 가 session 끝 = 잃음 = bad UX. **Short-term (context) + long-term (vector / graph DB) + episodic (timeline)**. mem0 / Letta / MemGPT. ## 📖 핵심 개념 - Short-term: context window (LLM input). - Long-term: persistent (DB). - Episodic: timeline (when X happened). - Semantic: 일반 fact ("user is vegan"). ## 💻 코드 패턴 ### Naive (no memory) ```ts async function chat(message: string) { return llm.complete({ prompt: message }); } // 매 호출 = 새 — 이전 conversation 잃음. ``` ### Conversation history (단순) ```ts const messages: Message[] = []; async function chat(input: string) { messages.push({ role: 'user', content: input }); const r = await llm.complete({ messages }); messages.push({ role: 'assistant', content: r.text }); return r.text; } ``` → Context window 한계. 100 message+ = 잃음. ### Sliding window ```ts function recent(messages: Message[], n: number = 20) { return messages.slice(-n); } await llm.complete({ messages: recent(allMessages) }); ``` → Old 잃음. Simple. ### Summary cascade ```ts async function getContext() { const all = await db.messages.find({}); if (all.length < 20) return all; const old = all.slice(0, -10); const recent = all.slice(-10); const summary = await llm.complete({ system: 'Summarize this conversation in 200 tokens.', messages: old, }); return [ { role: 'system', content: `Previous: ${summary}` }, ...recent, ]; } ``` → Old → summary. Recent intact. ### Vector memory ```ts // 매 message → embed → store async function remember(message: Message) { const emb = await embed(message.content); await db.memories.insert({ content: message.content, embedding: emb, timestamp: Date.now(), type: 'message', }); } // Search async function recall(query: string) { const emb = await embed(query); return db.memories.find({ $vectorSearch: { embedding: emb, k: 5 }, }); } ``` → Topic 별 retrieval. Long history OK. ### mem0 (managed) ```python from mem0 import Memory m = Memory() # Add m.add("My favorite color is blue", user_id="alice") m.add("I prefer dark mode", user_id="alice") # Search results = m.search(query="What does Alice like?", user_id="alice") # → ['favorite color is blue', 'prefer dark mode'] ``` → 자동 extract + dedupe. ### Letta (formerly MemGPT) ```python from letta import create_client client = create_client() agent = client.create_agent( name='Alice', memory={ 'human': 'User name: Bob.', 'persona': 'I am a helpful assistant.', }, ) response = client.user_message(agent_id=agent.id, message='Hi') ``` → "Memory editing" — agent 가 자기 memory 도 편집. ### Memory hierarchy (Letta 의 idea) ``` Core memory: 항상 context (작은). Recall memory: search 가능 (RAG). Archival: 영구 cold storage. → Hot / cold tier. ``` ### Episodic memory (timeline) ```ts interface Episode { id: string; timestamp: Date; summary: string; actors: string[]; tags: string[]; embedding: number[]; } await db.episodes.insert({ timestamp: new Date(), summary: 'User asked about pricing', actors: ['user', 'agent'], tags: ['pricing', 'sales'], }); // "지난 주 price 토의 가 있었나?" const r = await db.episodes.find({ timestamp: { $gte: new Date(Date.now() - 7 * 86400000) }, tags: 'pricing', }); ``` ### Semantic (fact) ```ts interface Fact { subject: string; predicate: string; object: string; } const facts: Fact[] = [ { subject: 'Alice', predicate: 'likes', object: 'blue' }, { subject: 'Alice', predicate: 'works_at', object: 'Acme Corp' }, ]; // Simple lookup const aliceFacts = facts.filter(f => f.subject === 'Alice'); ``` → Knowledge graph 식. ### Knowledge graph + LLM ```ts // Neo4j / Memgraph await graph.run(` MERGE (a:Person {name: 'Alice'}) MERGE (b:Color {name: 'blue'}) MERGE (a)-[:LIKES]->(b) `); await graph.run(` MATCH (a:Person {name: 'Alice'})-[:LIKES]->(c) RETURN c.name `); // ['blue'] ``` ### Auto-extraction (LLM) ```ts async function extractFacts(message: string) { const r = await llm.complete({ system: 'Extract facts as {subject, predicate, object}. Only solid facts.', prompt: message, }); return JSON.parse(r.text); } // "I'm Alice and I love coffee" // → [{subject: 'user', predicate: 'is', object: 'Alice'}, {predicate: 'loves', object: 'coffee'}] ``` ### Forgetting (TTL / decay) ```ts // 30 days 이전 = remove await db.memories.deleteMany({ timestamp: { $lt: Date.now() - 30 * 86400000 } }); // 또는 importance score interface Memory { importance: number; // 1-10 lastAccessed: Date; } // Recently accessed = 유지. 아니면 decay. ``` ### Memory consolidation (잠 식) ```ts // 주기 — 매일 / 매 N message async function consolidate(userId: string) { const recent = await db.memories.find({ userId, since: yesterday }); // Cluster const clusters = cluster(recent); // Each cluster → summary for (const c of clusters) { const summary = await llm.complete({ prompt: `Summarize these related memories: ${JSON.stringify(c)}`, }); await db.consolidated.insert({ userId, summary, count: c.length }); await db.memories.deleteMany({ id: { $in: c.map(x => x.id) } }); } } ``` → 인간 의 sleep 식 — episode → semantic. ### Working memory (current task) ```ts class WorkingMemory { private items: any[] = []; add(item: any) { this.items.push(item); if (this.items.length > 7) this.items.shift(); } context(): string { return this.items.map(i => JSON.stringify(i)).join('\n'); } } ``` → 7 ± 2 인간 식. ### Thread / session memory ```ts // 매 thread 가 separate const threadMemory = new Map(); async function chat(threadId: string, input: string) { const messages = threadMemory.get(threadId) ?? []; messages.push({ role: 'user', content: input }); const r = await llm.complete({ messages }); messages.push({ role: 'assistant', content: r.text }); threadMemory.set(threadId, messages); return r.text; } ``` ### Multi-user memory ```ts // User 별 memory. await db.memories.insert({ userId: 'alice', content: '...', embedding: ..., }); // Search 시 filter const r = await db.memories.find({ userId: 'alice', $vectorSearch: ... }); ``` → Privacy: user A 의 memory 가 user B 에 leak X. ### Privacy / GDPR ```ts // 사용자 가 forget async function forgetUser(userId: string) { await db.memories.deleteMany({ userId }); await db.episodes.deleteMany({ userId }); } ``` ### Observability ```ts @trace async function recall(query: string) { const r = await db.memories.search(...); log({ query, retrieved: r.map(x => x.id), count: r.length }); return r; } ``` → "Why agent forget?" debug. ### Cost ``` Vector embed: $0.02 / 1M token. Storage: $ / GB month. Search: $ / 1k query. 작은 system: Postgres pgvector (cheap). 큰: Pinecone / Qdrant. → User 별 N memory × M user = 빠르게 큰. ``` ### 함정 ``` - Memory 너무 많음: noise / cost. - Forgetting 없음: stale 정보. - Auto-extract 의 hallucination. - Memory 누설 (cross-user). - Embed cost 무시. - Long-term 가 context 불일치 (옛 fact). ``` ### Real-world - **ChatGPT memory**: 사용자 별 explicit memory. - **Character.AI**: persistent agent. - **Letta**: full memory framework. - **mem0**: memory-as-a-service. - **Pi (Inflection)**: long memory companion. ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | 단순 chat | Sliding window | | 긴 history | Summary cascade | | Topic recall | Vector memory | | Personal facts | Semantic / KG | | Timeline query | Episodic | | Production ready | mem0 / Letta | | Multi-user | User-scoped + privacy | ## ❌ 안티패턴 - **모든 history sent**: cost / context. - **No forget**: stale 정보. - **Cross-user leak**: privacy. - **Auto-extract 신뢰 100%**: hallucinate. - **No deduplication**: same fact 여러 번. - **No tracing**: debug 불가. - **Vector 만**: 정확 fact 안 됨. ## 🤖 LLM 활용 힌트 - Memory = short + long + episodic + semantic. - mem0 / Letta 가 production-ready. - Forgetting (TTL / consolidation) 가 인간 식. - Privacy (user-scoped) 필수. ## 🔗 관련 문서 - [[AI_Memory_Systems]] - [[AI_RAG_Advanced]] - [[AI_Token_Budget_Patterns]]