Files
2nd/10_Wiki/Topics/AI_and_ML/카산드라(Cassandra).md
T
2026-05-10 22:08:15 +09:00

184 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-카산드라-cassandra
title: 카산드라(Cassandra)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Apache Cassandra, Cassandra, C*]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [database, nosql, distributed, wide-column, ap-system]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Java/CQL
framework: Apache Cassandra 5.0
---
# 카산드라 (Cassandra)
## 매 한 줄
> **"매 write-optimized · 매 masterless · 매 AP 의 매 wide-column store"**. 매 Apache Cassandra 5.0 (2026) 은 매 Dynamo-style replication + 매 BigTable-style data model 의 매 합 — 매 single-region 1M+ writes/sec 의 매 linear scale, 매 multi-DC active-active, 매 tunable consistency. 매 partition key 설계 가 매 흥망 — 매 잘못된 model 은 매 hotspot · 매 large partition 의 매 재앙.
## 매 핵심
### 매 architecture
- **매 Masterless**: 매 모든 node 가 매 동등 — 매 single point of failure 부재.
- **매 Consistent hashing**: 매 token ring + 매 vnode (default 16) — 매 even distribution.
- **매 Replication**: 매 RF=3 의 매 typical, 매 NetworkTopologyStrategy 로 매 multi-DC.
- **매 Gossip**: 매 peer-to-peer cluster state.
- **매 LSM tree storage**: 매 memtable → SSTable, 매 compaction (STCS / LCS / TWCS).
### 매 consistency
- **매 Tunable**: ANY, ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL.
- **매 Strong**: R+W > N (e.g., RF=3, R=QUORUM, W=QUORUM).
- **매 Eventual**: ONE/ANY — 매 fast 하지만 매 stale read 가능.
- **매 LWT (Paxos)**: 매 conditional write — 매 비싸지만 매 linearizable.
### 매 응용
1. 매 time-series (IoT, metrics, logs).
2. 매 messaging / 매 feed (Discord 의 매 trillion+ msgs).
3. 매 session / 매 cart store.
4. 매 GenAI 의 매 vector + Cassandra 5 의 매 SAI vector index.
## 💻 패턴
### Pattern 1: 매 Schema Design (query-first)
```sql
-- 매 BAD: 매 hotspot — 매 single partition
CREATE TABLE messages (
channel_id uuid PRIMARY KEY,
msg_id timeuuid,
body text
);
-- 매 GOOD: 매 bucketed time partition
CREATE TABLE messages (
channel_id uuid,
bucket text, -- 매 'YYYY-MM-DD'
msg_id timeuuid,
body text,
PRIMARY KEY ((channel_id, bucket), msg_id)
) WITH CLUSTERING ORDER BY (msg_id DESC);
```
### Pattern 2: 매 Vector Search (Cassandra 5 SAI)
```sql
CREATE TABLE products (
id uuid PRIMARY KEY,
name text,
embedding vector<float, 1536>
);
CREATE CUSTOM INDEX ON products(embedding)
USING 'StorageAttachedIndex'
WITH OPTIONS = { 'similarity_function' : 'cosine' };
-- 매 ANN search
SELECT id, name FROM products
ORDER BY embedding ANN OF [0.1, 0.2, ...]
LIMIT 10;
```
### Pattern 3: 매 Driver Async (Java)
```java
CqlSession session = CqlSession.builder().build();
PreparedStatement ps = session.prepare(
"INSERT INTO messages (channel_id, bucket, msg_id, body) VALUES (?, ?, ?, ?)"
);
CompletionStage<AsyncResultSet> f = session.executeAsync(
ps.bind(channelId, bucket, msgId, body)
.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM)
);
```
### Pattern 4: 매 Compaction Strategy 선택
```sql
-- 매 time-series → TWCS
ALTER TABLE metrics WITH compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': '1',
'compaction_window_unit': 'DAYS'
};
-- 매 read-heavy → LCS
ALTER TABLE users WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': '160'
};
-- 매 write-heavy general → STCS (default)
```
### Pattern 5: 매 Multi-DC Replication
```sql
CREATE KEYSPACE app
WITH replication = {
'class': 'NetworkTopologyStrategy',
'us-east': 3,
'eu-west': 3,
'ap-northeast': 2
} AND durable_writes = true;
```
### Pattern 6: 매 LWT (conditional)
```sql
-- 매 unique constraint
INSERT INTO users (email, id) VALUES ('a@b.com', uuid())
IF NOT EXISTS;
-- 매 비쌈 — 매 4 round trip Paxos. 매 hot path 회피.
```
### Pattern 7: 매 Anti-pattern 진단
```sql
-- 매 nodetool tablestats 로 매 large partition 확인
-- nodetool tablestats keyspace.table | grep "Compacted partition maximum"
-- 매 100MB+ partition = 매 redesign signal
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 write 1M+/sec | Cassandra (자연 fit) |
| 매 strong consistency 필수 | LWT or 매 다른 DB (CockroachDB, Spanner) |
| 매 ad-hoc query / JOIN | Postgres / Trino — 매 Cassandra 부적합 |
| 매 time-series | Cassandra + TWCS or ScyllaDB |
| 매 vector + scale | Cassandra 5 SAI or Milvus/Qdrant |
| 매 small data (< 1TB) | Postgres — 매 Cassandra overkill |
**기본값**: 매 query-first schema, 매 LOCAL_QUORUM, 매 RF=3, 매 partition < 100MB.
## 🔗 Graph
- 부모: [[NoSQL]] · [[Distributed Database]]
- 변형: [[ScyllaDB]] · [[DynamoDB]]
- 응용: [[Time Series Database]] · [[Messaging Platform]]
- Adjacent: [[Vector Database]] · [[CAP Theorem]]
## 🤖 LLM 활용
**언제**: 매 large-scale write workload 의 매 design, 매 multi-DC active-active 요건, 매 time-series storage, 매 schema review.
**언제 X**: 매 transactional / OLTP / JOIN 매 heavy — 매 RDBMS 가 매 적합. 매 small data — 매 over-engineering.
## ❌ 안티패턴
- **매 Large partition (>100MB)**: 매 OOM, 매 compaction failure, 매 read latency 폭발.
- **매 Hotspot key**: 매 single-channel 모든 msg → 매 partition 폭발.
- **매 ALLOW FILTERING**: 매 full scan — 매 production X.
- **매 Secondary index 의 매 high cardinality**: 매 매번 매 fanout — 매 SAI 사용.
- **매 LWT 의 매 hot path**: 매 4× latency.
- **매 SQL mindset (JOIN, GROUP BY)**: 매 denormalize 의 매 의무.
## 🧪 검증 / 중복
- Verified (Apache Cassandra 5.0 docs, DataStax docs, Discord engineering blog 2026).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Cassandra 5.0 (SAI vector) full 정리 |