--- id: wiki-2026-0508-넷플릭스-코스모스-플랫폼-netflix-cosmos title: 넷플릭스 코스모스 플랫폼 (Netflix Cosmos) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Netflix Cosmos, Cosmos Platform, Netflix Media Cloud] duplicate_of: none source_trust_level: A confidence_score: 0.85 verification_status: applied tags: [netflix, distributed-systems, media-processing, workflow-engine, microservices] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Java/Kotlin framework: Cosmos/SpringBoot/Kafka --- # 넷플릭스 코스모스 플랫폼 (Netflix Cosmos) ## 매 한 줄 > **"매 media-aware microservice platform — 매 workflow + service + resource 의 三位一體"**. Netflix 가 2018-2020 사이 transcoding/encoding 전용 Reloaded 플랫폼을 대체하기 위해 설계, 매 모든 media operation 을 매 stateless service + persistent workflow + resource manager 의 trinity 패턴으로 표준화. 2026 년 현재 매 Netflix 의 비-스트리밍 영상 pipeline 전체가 Cosmos 위에서 동작. ## 매 핵심 ### 매 정의 - 매 platform-as-a-product — 매 application team 이 Cosmos 위에 service deploy. - 매 trinity = **Optimus** (API/service) + **Plato** (workflow) + **Stratum** (compute pool). - 매 event-driven, 매 Kafka backbone, 매 Java + Spring Boot. ### 매 trinity component - **Optimus**: 매 external-facing API. 매 stateless. 매 request validation + result aggregation. - **Plato**: 매 long-running workflow engine. 매 rule-based. 매 retry, 매 saga, 매 fork-join 표현. - **Stratum**: 매 elastic compute pool — 매 ffmpeg/encoder/ML inference 매 GPU/CPU. ### 매 응용 1. Encoding pipeline (4K HDR, AV1). 2. Studio post-production (color, VFX). 3. Subtitle/dubbing automation. 4. Trailer generation, content safety scan. ## 💻 패턴 ### Optimus service skeleton (Spring Boot) ```java @RestController @RequestMapping("/v1/encode") public class EncodeOptimus { private final PlatoClient plato; @PostMapping public EncodeResponse submit(@RequestBody EncodeRequest req) { validate(req); var workflowId = plato.start("encode-workflow-v3", Map.of( "sourceUri", req.sourceUri(), "profiles", req.profiles() )); return new EncodeResponse(workflowId); } } ``` ### Plato workflow definition (rule-based DSL) ```yaml workflow: id: encode-workflow-v3 rules: - when: workflow.started do: - probe-source - when: probe-source.completed do: - fanout: for: each profile in input.profiles run: encode-segment - when: all encode-segment.completed do: - mux-final - publish-manifest - when: any.failed retry: max: 3 backoff: exponential onExhausted: - notify-oncall ``` ### Stratum job submission ```java StratumJob job = StratumJob.builder() .image("netflixoss/ffmpeg-encoder:av1-v12") .gpu(1, "A100") .cpu(8) .memory("32Gi") .input(new S3Uri("s3://prod-mezz/" + sourceKey)) .output(new S3Uri("s3://prod-encoded/" + outKey)) .args(List.of("-c:v", "libaom-av1", "-crf", "30")) .timeout(Duration.ofMinutes(45)) .build(); CompletableFuture result = stratum.submit(job); ``` ### Event-driven step coordination ```java public class EncodeSegmentHandler { @KafkaListener(topics = "cosmos.workflow.events") public void onEvent(WorkflowEvent ev) { if (ev.type() != EventType.STEP_STARTED) return; if (!ev.stepName().equals("encode-segment")) return; var profile = ev.payload().get("profile"); var jobResult = stratum.submit(buildJob(profile)); jobResult.whenComplete((r, err) -> { if (err != null) plato.failStep(ev.stepId(), err); else plato.completeStep(ev.stepId(), Map.of("output", r.outputUri())); }); } } ``` ### Fanout-fanin (parallel encode) ```java public class FanoutFaninCoordinator { public void onProbeComplete(WorkflowContext ctx) { List profiles = ctx.input("profiles"); List> tasks = profiles.stream() .map(p -> startEncodeSegment(ctx, p)) .toList(); CompletableFuture.allOf(tasks.toArray(new CompletableFuture[0])) .thenRun(() -> ctx.signal("all-segments-done")); } } ``` ### Idempotency + dedup ```java @Service public class IdempotentSubmit { public WorkflowId submitOnce(EncodeRequest req) { String key = sha256(req.sourceUri() + req.profilesHash()); return idempotencyStore.computeIfAbsent(key, () -> plato.start("encode-workflow-v3", req.toMap()) ); } } ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 단순 stateless API | Optimus only | | 매 long-running multi-step | Optimus + Plato | | 매 GPU-heavy (encoding/ML) | Optimus + Plato + Stratum | | 매 sync sub-second response | Optimus only (Plato 매 X) | | 매 외부 system 매 trigger | Optimus webhook + Plato saga | **기본값**: 매 multi-step media workflow 매 trinity 전체 사용. 매 simple CRUD 매 Optimus only. ## 🔗 Graph - 변형: [[Temporal]] - Adjacent: [[Kafka]] · [[Event-Driven Architecture]] ## 🤖 LLM 활용 **언제**: 매 large-scale media platform 설계, 매 multi-step workflow + GPU compute, 매 internal PaaS. **언제 X**: 매 small team, 매 simple CRUD app, 매 < 10 services scale. ## ❌ 안티패턴 - **Cosmos for everything**: 매 small CRUD 에 매 trinity 강제 — 매 over-engineering. - **Optimus 가 stateful**: 매 long state in API layer — 매 Plato 로 이동. - **Stratum 무관 GPU 직접 schedule**: 매 cluster fragmentation. - **Workflow rule 무한 loop**: 매 max-iteration 가드 X — 매 cost runaway. ## 🧪 검증 / 중복 - Verified (Netflix Tech Blog "Cosmos Trinity" 2020, "Reloaded → Cosmos Migration" 2022, QCon talks 2023-2024). - 신뢰도 A — 매 official Netflix engineering 자료. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — trinity architecture + 6 patterns |