8.6 KiB
8.6 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ai-memory-persistence-deep | AI Memory Persistence — long-term / vector / graph | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
AI Memory Persistence
Agent 가 session 끝 = 잃음 = bad UX. Short-term (context) + long-term (vector / graph DB) + episodic (timeline). mem0 / Letta / MemGPT.
📖 핵심 개념
- Short-term: context window (LLM input).
- Long-term: persistent (DB).
- Episodic: timeline (when X happened).
- Semantic: 일반 fact ("user is vegan").
💻 코드 패턴
Naive (no memory)
async function chat(message: string) {
return llm.complete({ prompt: message });
}
// 매 호출 = 새 — 이전 conversation 잃음.
Conversation history (단순)
const messages: Message[] = [];
async function chat(input: string) {
messages.push({ role: 'user', content: input });
const r = await llm.complete({ messages });
messages.push({ role: 'assistant', content: r.text });
return r.text;
}
→ Context window 한계. 100 message+ = 잃음.
Sliding window
function recent(messages: Message[], n: number = 20) {
return messages.slice(-n);
}
await llm.complete({ messages: recent(allMessages) });
→ Old 잃음. Simple.
Summary cascade
async function getContext() {
const all = await db.messages.find({});
if (all.length < 20) return all;
const old = all.slice(0, -10);
const recent = all.slice(-10);
const summary = await llm.complete({
system: 'Summarize this conversation in 200 tokens.',
messages: old,
});
return [
{ role: 'system', content: `Previous: ${summary}` },
...recent,
];
}
→ Old → summary. Recent intact.
Vector memory
// 매 message → embed → store
async function remember(message: Message) {
const emb = await embed(message.content);
await db.memories.insert({
content: message.content,
embedding: emb,
timestamp: Date.now(),
type: 'message',
});
}
// Search
async function recall(query: string) {
const emb = await embed(query);
return db.memories.find({
$vectorSearch: { embedding: emb, k: 5 },
});
}
→ Topic 별 retrieval. Long history OK.
mem0 (managed)
from mem0 import Memory
m = Memory()
# Add
m.add("My favorite color is blue", user_id="alice")
m.add("I prefer dark mode", user_id="alice")
# Search
results = m.search(query="What does Alice like?", user_id="alice")
# → ['favorite color is blue', 'prefer dark mode']
→ 자동 extract + dedupe.
Letta (formerly MemGPT)
from letta import create_client
client = create_client()
agent = client.create_agent(
name='Alice',
memory={
'human': 'User name: Bob.',
'persona': 'I am a helpful assistant.',
},
)
response = client.user_message(agent_id=agent.id, message='Hi')
→ "Memory editing" — agent 가 자기 memory 도 편집.
Memory hierarchy (Letta 의 idea)
Core memory: 항상 context (작은).
Recall memory: search 가능 (RAG).
Archival: 영구 cold storage.
→ Hot / cold tier.
Episodic memory (timeline)
interface Episode {
id: string;
timestamp: Date;
summary: string;
actors: string[];
tags: string[];
embedding: number[];
}
await db.episodes.insert({
timestamp: new Date(),
summary: 'User asked about pricing',
actors: ['user', 'agent'],
tags: ['pricing', 'sales'],
});
// "지난 주 price 토의 가 있었나?"
const r = await db.episodes.find({
timestamp: { $gte: new Date(Date.now() - 7 * 86400000) },
tags: 'pricing',
});
Semantic (fact)
interface Fact {
subject: string;
predicate: string;
object: string;
}
const facts: Fact[] = [
{ subject: 'Alice', predicate: 'likes', object: 'blue' },
{ subject: 'Alice', predicate: 'works_at', object: 'Acme Corp' },
];
// Simple lookup
const aliceFacts = facts.filter(f => f.subject === 'Alice');
→ Knowledge graph 식.
Knowledge graph + LLM
// Neo4j / Memgraph
await graph.run(`
MERGE (a:Person {name: 'Alice'})
MERGE (b:Color {name: 'blue'})
MERGE (a)-[:LIKES]->(b)
`);
await graph.run(`
MATCH (a:Person {name: 'Alice'})-[:LIKES]->(c)
RETURN c.name
`);
// ['blue']
Auto-extraction (LLM)
async function extractFacts(message: string) {
const r = await llm.complete({
system: 'Extract facts as {subject, predicate, object}. Only solid facts.',
prompt: message,
});
return JSON.parse(r.text);
}
// "I'm Alice and I love coffee"
// → [{subject: 'user', predicate: 'is', object: 'Alice'}, {predicate: 'loves', object: 'coffee'}]
Forgetting (TTL / decay)
// 30 days 이전 = remove
await db.memories.deleteMany({ timestamp: { $lt: Date.now() - 30 * 86400000 } });
// 또는 importance score
interface Memory {
importance: number; // 1-10
lastAccessed: Date;
}
// Recently accessed = 유지. 아니면 decay.
Memory consolidation (잠 식)
// 주기 — 매일 / 매 N message
async function consolidate(userId: string) {
const recent = await db.memories.find({ userId, since: yesterday });
// Cluster
const clusters = cluster(recent);
// Each cluster → summary
for (const c of clusters) {
const summary = await llm.complete({
prompt: `Summarize these related memories: ${JSON.stringify(c)}`,
});
await db.consolidated.insert({ userId, summary, count: c.length });
await db.memories.deleteMany({ id: { $in: c.map(x => x.id) } });
}
}
→ 인간 의 sleep 식 — episode → semantic.
Working memory (current task)
class WorkingMemory {
private items: any[] = [];
add(item: any) {
this.items.push(item);
if (this.items.length > 7) this.items.shift();
}
context(): string {
return this.items.map(i => JSON.stringify(i)).join('\n');
}
}
→ 7 ± 2 인간 식.
Thread / session memory
// 매 thread 가 separate
const threadMemory = new Map<string, Message[]>();
async function chat(threadId: string, input: string) {
const messages = threadMemory.get(threadId) ?? [];
messages.push({ role: 'user', content: input });
const r = await llm.complete({ messages });
messages.push({ role: 'assistant', content: r.text });
threadMemory.set(threadId, messages);
return r.text;
}
Multi-user memory
// User 별 memory.
await db.memories.insert({
userId: 'alice',
content: '...',
embedding: ...,
});
// Search 시 filter
const r = await db.memories.find({ userId: 'alice', $vectorSearch: ... });
→ Privacy: user A 의 memory 가 user B 에 leak X.
Privacy / GDPR
// 사용자 가 forget
async function forgetUser(userId: string) {
await db.memories.deleteMany({ userId });
await db.episodes.deleteMany({ userId });
}
Observability
@trace
async function recall(query: string) {
const r = await db.memories.search(...);
log({ query, retrieved: r.map(x => x.id), count: r.length });
return r;
}
→ "Why agent forget?" debug.
Cost
Vector embed: $0.02 / 1M token.
Storage: $ / GB month.
Search: $ / 1k query.
작은 system: Postgres pgvector (cheap).
큰: Pinecone / Qdrant.
→ User 별 N memory × M user = 빠르게 큰.
함정
- Memory 너무 많음: noise / cost.
- Forgetting 없음: stale 정보.
- Auto-extract 의 hallucination.
- Memory 누설 (cross-user).
- Embed cost 무시.
- Long-term 가 context 불일치 (옛 fact).
Real-world
- ChatGPT memory: 사용자 별 explicit memory.
- Character.AI: persistent agent.
- Letta: full memory framework.
- mem0: memory-as-a-service.
- Pi (Inflection): long memory companion.
🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| 단순 chat | Sliding window |
| 긴 history | Summary cascade |
| Topic recall | Vector memory |
| Personal facts | Semantic / KG |
| Timeline query | Episodic |
| Production ready | mem0 / Letta |
| Multi-user | User-scoped + privacy |
❌ 안티패턴
- 모든 history sent: cost / context.
- No forget: stale 정보.
- Cross-user leak: privacy.
- Auto-extract 신뢰 100%: hallucinate.
- No deduplication: same fact 여러 번.
- No tracing: debug 불가.
- Vector 만: 정확 fact 안 됨.
🤖 LLM 활용 힌트
- Memory = short + long + episodic + semantic.
- mem0 / Letta 가 production-ready.
- Forgetting (TTL / consolidation) 가 인간 식.
- Privacy (user-scoped) 필수.