Files
2nd/10_Wiki/Topics/Programming & Language/넷플릭스 코스모스 플랫폼 (Netflix Cosmos).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

189 lines
6.1 KiB
Markdown

---
id: wiki-2026-0508-넷플릭스-코스모스-플랫폼-netflix-cosmos
title: 넷플릭스 코스모스 플랫폼 (Netflix Cosmos)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Netflix Cosmos, Cosmos Platform, Netflix Media Cloud]
duplicate_of: none
source_trust_level: A
confidence_score: 0.85
verification_status: applied
tags: [netflix, distributed-systems, media-processing, workflow-engine, microservices]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Java/Kotlin
framework: Cosmos/SpringBoot/Kafka
---
# 넷플릭스 코스모스 플랫폼 (Netflix Cosmos)
## 매 한 줄
> **"매 media-aware microservice platform — 매 workflow + service + resource 의 三位一體"**. Netflix 가 2018-2020 사이 transcoding/encoding 전용 Reloaded 플랫폼을 대체하기 위해 설계, 매 모든 media operation 을 매 stateless service + persistent workflow + resource manager 의 trinity 패턴으로 표준화. 2026 년 현재 매 Netflix 의 비-스트리밍 영상 pipeline 전체가 Cosmos 위에서 동작.
## 매 핵심
### 매 정의
- 매 platform-as-a-product — 매 application team 이 Cosmos 위에 service deploy.
- 매 trinity = **Optimus** (API/service) + **Plato** (workflow) + **Stratum** (compute pool).
- 매 event-driven, 매 Kafka backbone, 매 Java + Spring Boot.
### 매 trinity component
- **Optimus**: 매 external-facing API. 매 stateless. 매 request validation + result aggregation.
- **Plato**: 매 long-running workflow engine. 매 rule-based. 매 retry, 매 saga, 매 fork-join 표현.
- **Stratum**: 매 elastic compute pool — 매 ffmpeg/encoder/ML inference 매 GPU/CPU.
### 매 응용
1. Encoding pipeline (4K HDR, AV1).
2. Studio post-production (color, VFX).
3. Subtitle/dubbing automation.
4. Trailer generation, content safety scan.
## 💻 패턴
### Optimus service skeleton (Spring Boot)
```java
@RestController
@RequestMapping("/v1/encode")
public class EncodeOptimus {
private final PlatoClient plato;
@PostMapping
public EncodeResponse submit(@RequestBody EncodeRequest req) {
validate(req);
var workflowId = plato.start("encode-workflow-v3", Map.of(
"sourceUri", req.sourceUri(),
"profiles", req.profiles()
));
return new EncodeResponse(workflowId);
}
}
```
### Plato workflow definition (rule-based DSL)
```yaml
workflow:
id: encode-workflow-v3
rules:
- when: workflow.started
do:
- probe-source
- when: probe-source.completed
do:
- fanout:
for: each profile in input.profiles
run: encode-segment
- when: all encode-segment.completed
do:
- mux-final
- publish-manifest
- when: any.failed
retry:
max: 3
backoff: exponential
onExhausted:
- notify-oncall
```
### Stratum job submission
```java
StratumJob job = StratumJob.builder()
.image("netflixoss/ffmpeg-encoder:av1-v12")
.gpu(1, "A100")
.cpu(8)
.memory("32Gi")
.input(new S3Uri("s3://prod-mezz/" + sourceKey))
.output(new S3Uri("s3://prod-encoded/" + outKey))
.args(List.of("-c:v", "libaom-av1", "-crf", "30"))
.timeout(Duration.ofMinutes(45))
.build();
CompletableFuture<JobResult> result = stratum.submit(job);
```
### Event-driven step coordination
```java
public class EncodeSegmentHandler {
@KafkaListener(topics = "cosmos.workflow.events")
public void onEvent(WorkflowEvent ev) {
if (ev.type() != EventType.STEP_STARTED) return;
if (!ev.stepName().equals("encode-segment")) return;
var profile = ev.payload().get("profile");
var jobResult = stratum.submit(buildJob(profile));
jobResult.whenComplete((r, err) -> {
if (err != null) plato.failStep(ev.stepId(), err);
else plato.completeStep(ev.stepId(), Map.of("output", r.outputUri()));
});
}
}
```
### Fanout-fanin (parallel encode)
```java
public class FanoutFaninCoordinator {
public void onProbeComplete(WorkflowContext ctx) {
List<String> profiles = ctx.input("profiles");
List<CompletableFuture<Void>> tasks = profiles.stream()
.map(p -> startEncodeSegment(ctx, p))
.toList();
CompletableFuture.allOf(tasks.toArray(new CompletableFuture[0]))
.thenRun(() -> ctx.signal("all-segments-done"));
}
}
```
### Idempotency + dedup
```java
@Service
public class IdempotentSubmit {
public WorkflowId submitOnce(EncodeRequest req) {
String key = sha256(req.sourceUri() + req.profilesHash());
return idempotencyStore.computeIfAbsent(key, () ->
plato.start("encode-workflow-v3", req.toMap())
);
}
}
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 단순 stateless API | Optimus only |
| 매 long-running multi-step | Optimus + Plato |
| 매 GPU-heavy (encoding/ML) | Optimus + Plato + Stratum |
| 매 sync sub-second response | Optimus only (Plato 매 X) |
| 매 외부 system 매 trigger | Optimus webhook + Plato saga |
**기본값**: 매 multi-step media workflow 매 trinity 전체 사용. 매 simple CRUD 매 Optimus only.
## 🔗 Graph
- 변형: [[Temporal]]
- Adjacent: [[Kafka]] · [[Event-Driven Architecture]]
## 🤖 LLM 활용
**언제**: 매 large-scale media platform 설계, 매 multi-step workflow + GPU compute, 매 internal PaaS.
**언제 X**: 매 small team, 매 simple CRUD app, 매 < 10 services scale.
## ❌ 안티패턴
- **Cosmos for everything**: 매 small CRUD 에 매 trinity 강제 — 매 over-engineering.
- **Optimus 가 stateful**: 매 long state in API layer — 매 Plato 로 이동.
- **Stratum 무관 GPU 직접 schedule**: 매 cluster fragmentation.
- **Workflow rule 무한 loop**: 매 max-iteration 가드 X — 매 cost runaway.
## 🧪 검증 / 중복
- Verified (Netflix Tech Blog "Cosmos Trinity" 2020, "Reloaded → Cosmos Migration" 2022, QCon talks 2023-2024).
- 신뢰도 A — 매 official Netflix engineering 자료.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — trinity architecture + 6 patterns |