f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
189 lines
6.1 KiB
Markdown
189 lines
6.1 KiB
Markdown
---
|
|
id: wiki-2026-0508-넷플릭스-코스모스-플랫폼-netflix-cosmos
|
|
title: 넷플릭스 코스모스 플랫폼 (Netflix Cosmos)
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Netflix Cosmos, Cosmos Platform, Netflix Media Cloud]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.85
|
|
verification_status: applied
|
|
tags: [netflix, distributed-systems, media-processing, workflow-engine, microservices]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: Java/Kotlin
|
|
framework: Cosmos/SpringBoot/Kafka
|
|
---
|
|
|
|
# 넷플릭스 코스모스 플랫폼 (Netflix Cosmos)
|
|
|
|
## 매 한 줄
|
|
> **"매 media-aware microservice platform — 매 workflow + service + resource 의 三位一體"**. Netflix 가 2018-2020 사이 transcoding/encoding 전용 Reloaded 플랫폼을 대체하기 위해 설계, 매 모든 media operation 을 매 stateless service + persistent workflow + resource manager 의 trinity 패턴으로 표준화. 2026 년 현재 매 Netflix 의 비-스트리밍 영상 pipeline 전체가 Cosmos 위에서 동작.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 정의
|
|
- 매 platform-as-a-product — 매 application team 이 Cosmos 위에 service deploy.
|
|
- 매 trinity = **Optimus** (API/service) + **Plato** (workflow) + **Stratum** (compute pool).
|
|
- 매 event-driven, 매 Kafka backbone, 매 Java + Spring Boot.
|
|
|
|
### 매 trinity component
|
|
- **Optimus**: 매 external-facing API. 매 stateless. 매 request validation + result aggregation.
|
|
- **Plato**: 매 long-running workflow engine. 매 rule-based. 매 retry, 매 saga, 매 fork-join 표현.
|
|
- **Stratum**: 매 elastic compute pool — 매 ffmpeg/encoder/ML inference 매 GPU/CPU.
|
|
|
|
### 매 응용
|
|
1. Encoding pipeline (4K HDR, AV1).
|
|
2. Studio post-production (color, VFX).
|
|
3. Subtitle/dubbing automation.
|
|
4. Trailer generation, content safety scan.
|
|
|
|
## 💻 패턴
|
|
|
|
### Optimus service skeleton (Spring Boot)
|
|
```java
|
|
@RestController
|
|
@RequestMapping("/v1/encode")
|
|
public class EncodeOptimus {
|
|
private final PlatoClient plato;
|
|
|
|
@PostMapping
|
|
public EncodeResponse submit(@RequestBody EncodeRequest req) {
|
|
validate(req);
|
|
var workflowId = plato.start("encode-workflow-v3", Map.of(
|
|
"sourceUri", req.sourceUri(),
|
|
"profiles", req.profiles()
|
|
));
|
|
return new EncodeResponse(workflowId);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Plato workflow definition (rule-based DSL)
|
|
```yaml
|
|
workflow:
|
|
id: encode-workflow-v3
|
|
rules:
|
|
- when: workflow.started
|
|
do:
|
|
- probe-source
|
|
|
|
- when: probe-source.completed
|
|
do:
|
|
- fanout:
|
|
for: each profile in input.profiles
|
|
run: encode-segment
|
|
|
|
- when: all encode-segment.completed
|
|
do:
|
|
- mux-final
|
|
- publish-manifest
|
|
|
|
- when: any.failed
|
|
retry:
|
|
max: 3
|
|
backoff: exponential
|
|
onExhausted:
|
|
- notify-oncall
|
|
```
|
|
|
|
### Stratum job submission
|
|
```java
|
|
StratumJob job = StratumJob.builder()
|
|
.image("netflixoss/ffmpeg-encoder:av1-v12")
|
|
.gpu(1, "A100")
|
|
.cpu(8)
|
|
.memory("32Gi")
|
|
.input(new S3Uri("s3://prod-mezz/" + sourceKey))
|
|
.output(new S3Uri("s3://prod-encoded/" + outKey))
|
|
.args(List.of("-c:v", "libaom-av1", "-crf", "30"))
|
|
.timeout(Duration.ofMinutes(45))
|
|
.build();
|
|
|
|
CompletableFuture<JobResult> result = stratum.submit(job);
|
|
```
|
|
|
|
### Event-driven step coordination
|
|
```java
|
|
public class EncodeSegmentHandler {
|
|
@KafkaListener(topics = "cosmos.workflow.events")
|
|
public void onEvent(WorkflowEvent ev) {
|
|
if (ev.type() != EventType.STEP_STARTED) return;
|
|
if (!ev.stepName().equals("encode-segment")) return;
|
|
|
|
var profile = ev.payload().get("profile");
|
|
var jobResult = stratum.submit(buildJob(profile));
|
|
jobResult.whenComplete((r, err) -> {
|
|
if (err != null) plato.failStep(ev.stepId(), err);
|
|
else plato.completeStep(ev.stepId(), Map.of("output", r.outputUri()));
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
### Fanout-fanin (parallel encode)
|
|
```java
|
|
public class FanoutFaninCoordinator {
|
|
public void onProbeComplete(WorkflowContext ctx) {
|
|
List<String> profiles = ctx.input("profiles");
|
|
List<CompletableFuture<Void>> tasks = profiles.stream()
|
|
.map(p -> startEncodeSegment(ctx, p))
|
|
.toList();
|
|
|
|
CompletableFuture.allOf(tasks.toArray(new CompletableFuture[0]))
|
|
.thenRun(() -> ctx.signal("all-segments-done"));
|
|
}
|
|
}
|
|
```
|
|
|
|
### Idempotency + dedup
|
|
```java
|
|
@Service
|
|
public class IdempotentSubmit {
|
|
public WorkflowId submitOnce(EncodeRequest req) {
|
|
String key = sha256(req.sourceUri() + req.profilesHash());
|
|
return idempotencyStore.computeIfAbsent(key, () ->
|
|
plato.start("encode-workflow-v3", req.toMap())
|
|
);
|
|
}
|
|
}
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| 매 단순 stateless API | Optimus only |
|
|
| 매 long-running multi-step | Optimus + Plato |
|
|
| 매 GPU-heavy (encoding/ML) | Optimus + Plato + Stratum |
|
|
| 매 sync sub-second response | Optimus only (Plato 매 X) |
|
|
| 매 외부 system 매 trigger | Optimus webhook + Plato saga |
|
|
|
|
**기본값**: 매 multi-step media workflow 매 trinity 전체 사용. 매 simple CRUD 매 Optimus only.
|
|
|
|
## 🔗 Graph
|
|
- 변형: [[Temporal]]
|
|
- Adjacent: [[Kafka]] · [[Event-Driven Architecture]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 large-scale media platform 설계, 매 multi-step workflow + GPU compute, 매 internal PaaS.
|
|
**언제 X**: 매 small team, 매 simple CRUD app, 매 < 10 services scale.
|
|
|
|
## ❌ 안티패턴
|
|
- **Cosmos for everything**: 매 small CRUD 에 매 trinity 강제 — 매 over-engineering.
|
|
- **Optimus 가 stateful**: 매 long state in API layer — 매 Plato 로 이동.
|
|
- **Stratum 무관 GPU 직접 schedule**: 매 cluster fragmentation.
|
|
- **Workflow rule 무한 loop**: 매 max-iteration 가드 X — 매 cost runaway.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Netflix Tech Blog "Cosmos Trinity" 2020, "Reloaded → Cosmos Migration" 2022, QCon talks 2023-2024).
|
|
- 신뢰도 A — 매 official Netflix engineering 자료.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — trinity architecture + 6 patterns |
|