---
id: ai-memory-persistence-deep
title: AI Memory Persistence — long-term / vector / graph
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [ai, memory, agents, vibe-coding]
tech_stack: { language: "TS / Python", applicable_to: ["AI"] }
applied_in: []
aliases: [agent memory, long-term memory, episodic, semantic, mem0, Letta, memgpt]
---

# AI Memory Persistence

> Agent 가 session 끝 = 잃음 = bad UX. **Short-term (context) + long-term (vector / graph DB) + episodic (timeline)**. mem0 / Letta / MemGPT.

## 📖 핵심 개념
- Short-term: context window (LLM input).
- Long-term: persistent (DB).
- Episodic: timeline (when X happened).
- Semantic: 일반 fact ("user is vegan").

## 💻 코드 패턴

### Naive (no memory)
```ts
async function chat(message: string) {
  return llm.complete({ prompt: message });
}

// 매 호출 = 새 — 이전 conversation 잃음.
```

### Conversation history (단순)
```ts
const messages: Message[] = [];

async function chat(input: string) {
  messages.push({ role: 'user', content: input });
  const r = await llm.complete({ messages });
  messages.push({ role: 'assistant', content: r.text });
  return r.text;
}
```

→ Context window 한계. 100 message+ = 잃음.

### Sliding window
```ts
function recent(messages: Message[], n: number = 20) {
  return messages.slice(-n);
}

await llm.complete({ messages: recent(allMessages) });
```

→ Old 잃음. Simple.

### Summary cascade
```ts
async function getContext() {
  const all = await db.messages.find({});
  if (all.length < 20) return all;
  
  const old = all.slice(0, -10);
  const recent = all.slice(-10);
  
  const summary = await llm.complete({
    system: 'Summarize this conversation in 200 tokens.',
    messages: old,
  });
  
  return [
    { role: 'system', content: `Previous: ${summary}` },
    ...recent,
  ];
}
```

→ Old → summary. Recent intact.

### Vector memory
```ts
// 매 message → embed → store
async function remember(message: Message) {
  const emb = await embed(message.content);
  await db.memories.insert({
    content: message.content,
    embedding: emb,
    timestamp: Date.now(),
    type: 'message',
  });
}

// Search
async function recall(query: string) {
  const emb = await embed(query);
  return db.memories.find({
    $vectorSearch: { embedding: emb, k: 5 },
  });
}
```

→ Topic 별 retrieval. Long history OK.

### mem0 (managed)
```python
from mem0 import Memory
m = Memory()

# Add
m.add("My favorite color is blue", user_id="alice")
m.add("I prefer dark mode", user_id="alice")

# Search
results = m.search(query="What does Alice like?", user_id="alice")
# → ['favorite color is blue', 'prefer dark mode']
```

→ 자동 extract + dedupe.

### Letta (formerly MemGPT)
```python
from letta import create_client
client = create_client()

agent = client.create_agent(
    name='Alice',
    memory={
        'human': 'User name: Bob.',
        'persona': 'I am a helpful assistant.',
    },
)

response = client.user_message(agent_id=agent.id, message='Hi')
```

→ "Memory editing" — agent 가 자기 memory 도 편집.

### Memory hierarchy (Letta 의 idea)
```
Core memory: 항상 context (작은).
Recall memory: search 가능 (RAG).
Archival: 영구 cold storage.

→ Hot / cold tier.
```

### Episodic memory (timeline)
```ts
interface Episode {
  id: string;
  timestamp: Date;
  summary: string;
  actors: string[];
  tags: string[];
  embedding: number[];
}

await db.episodes.insert({
  timestamp: new Date(),
  summary: 'User asked about pricing',
  actors: ['user', 'agent'],
  tags: ['pricing', 'sales'],
});

// "지난 주 price 토의 가 있었나?"
const r = await db.episodes.find({
  timestamp: { $gte: new Date(Date.now() - 7 * 86400000) },
  tags: 'pricing',
});
```

### Semantic (fact)
```ts
interface Fact {
  subject: string;
  predicate: string;
  object: string;
}

const facts: Fact[] = [
  { subject: 'Alice', predicate: 'likes', object: 'blue' },
  { subject: 'Alice', predicate: 'works_at', object: 'Acme Corp' },
];

// Simple lookup
const aliceFacts = facts.filter(f => f.subject === 'Alice');
```

→ Knowledge graph 식.

### Knowledge graph + LLM
```ts
// Neo4j / Memgraph
await graph.run(`
  MERGE (a:Person {name: 'Alice'})
  MERGE (b:Color {name: 'blue'})
  MERGE (a)-[:LIKES]->(b)
`);

await graph.run(`
  MATCH (a:Person {name: 'Alice'})-[:LIKES]->(c)
  RETURN c.name
`);
// ['blue']
```

### Auto-extraction (LLM)
```ts
async function extractFacts(message: string) {
  const r = await llm.complete({
    system: 'Extract facts as {subject, predicate, object}. Only solid facts.',
    prompt: message,
  });
  return JSON.parse(r.text);
}

// "I'm Alice and I love coffee"
// → [{subject: 'user', predicate: 'is', object: 'Alice'}, {predicate: 'loves', object: 'coffee'}]
```

### Forgetting (TTL / decay)
```ts
// 30 days 이전 = remove
await db.memories.deleteMany({ timestamp: { $lt: Date.now() - 30 * 86400000 } });

// 또는 importance score
interface Memory {
  importance: number;  // 1-10
  lastAccessed: Date;
}

// Recently accessed = 유지. 아니면 decay.
```

### Memory consolidation (잠 식)
```ts
// 주기 — 매일 / 매 N message
async function consolidate(userId: string) {
  const recent = await db.memories.find({ userId, since: yesterday });
  
  // Cluster
  const clusters = cluster(recent);
  
  // Each cluster → summary
  for (const c of clusters) {
    const summary = await llm.complete({
      prompt: `Summarize these related memories: ${JSON.stringify(c)}`,
    });
    await db.consolidated.insert({ userId, summary, count: c.length });
    await db.memories.deleteMany({ id: { $in: c.map(x => x.id) } });
  }
}
```

→ 인간 의 sleep 식 — episode → semantic.

### Working memory (current task)
```ts
class WorkingMemory {
  private items: any[] = [];
  
  add(item: any) {
    this.items.push(item);
    if (this.items.length > 7) this.items.shift();
  }
  
  context(): string {
    return this.items.map(i => JSON.stringify(i)).join('\n');
  }
}
```

→ 7 ± 2 인간 식.

### Thread / session memory
```ts
// 매 thread 가 separate
const threadMemory = new Map<string, Message[]>();

async function chat(threadId: string, input: string) {
  const messages = threadMemory.get(threadId) ?? [];
  messages.push({ role: 'user', content: input });
  const r = await llm.complete({ messages });
  messages.push({ role: 'assistant', content: r.text });
  threadMemory.set(threadId, messages);
  return r.text;
}
```

### Multi-user memory
```ts
// User 별 memory.
await db.memories.insert({
  userId: 'alice',
  content: '...',
  embedding: ...,
});

// Search 시 filter
const r = await db.memories.find({ userId: 'alice', $vectorSearch: ... });
```

→ Privacy: user A 의 memory 가 user B 에 leak X.

### Privacy / GDPR
```ts
// 사용자 가 forget
async function forgetUser(userId: string) {
  await db.memories.deleteMany({ userId });
  await db.episodes.deleteMany({ userId });
}
```

### Observability
```ts
@trace
async function recall(query: string) {
  const r = await db.memories.search(...);
  log({ query, retrieved: r.map(x => x.id), count: r.length });
  return r;
}
```

→ "Why agent forget?" debug.

### Cost
```
Vector embed: $0.02 / 1M token.
Storage: $ / GB month.
Search: $ / 1k query.

작은 system: Postgres pgvector (cheap).
큰: Pinecone / Qdrant.

→ User 별 N memory × M user = 빠르게 큰.
```

### 함정
```
- Memory 너무 많음: noise / cost.
- Forgetting 없음: stale 정보.
- Auto-extract 의 hallucination.
- Memory 누설 (cross-user).
- Embed cost 무시.
- Long-term 가 context 불일치 (옛 fact).
```

### Real-world
- **ChatGPT memory**: 사용자 별 explicit memory.
- **Character.AI**: persistent agent.
- **Letta**: full memory framework.
- **mem0**: memory-as-a-service.
- **Pi (Inflection)**: long memory companion.

## 🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| 단순 chat | Sliding window |
| 긴 history | Summary cascade |
| Topic recall | Vector memory |
| Personal facts | Semantic / KG |
| Timeline query | Episodic |
| Production ready | mem0 / Letta |
| Multi-user | User-scoped + privacy |

## ❌ 안티패턴
- **모든 history sent**: cost / context.
- **No forget**: stale 정보.
- **Cross-user leak**: privacy.
- **Auto-extract 신뢰 100%**: hallucinate.
- **No deduplication**: same fact 여러 번.
- **No tracing**: debug 불가.
- **Vector 만**: 정확 fact 안 됨.

## 🤖 LLM 활용 힌트
- Memory = short + long + episodic + semantic.
- mem0 / Letta 가 production-ready.
- Forgetting (TTL / consolidation) 가 인간 식.
- Privacy (user-scoped) 필수.

## 🔗 관련 문서
- [[AI_Memory_Systems]]
- [[AI_RAG_Advanced]]
- [[AI_Token_Budget_Patterns]]