--- id: devops-k8s-operators title: Kubernetes Operators — CRD + Controller category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [devops, kubernetes, operator, vibe-coding] tech_stack: { language: "Go / YAML", applicable_to: ["DevOps"] } applied_in: [] aliases: [K8s operator, CRD, custom resource, controller, kubebuilder, operator-sdk, reconcile] --- # Kubernetes Operators > "Application 의 lifecycle 가 K8s native". **CRD (Custom Resource) + Controller (reconcile loop)**. Postgres / Kafka / Redis 가 자체 operator. kubebuilder / operator-sdk. ## 📖 핵심 개념 - CRD: 새 K8s resource type. - Controller: actual state → desired state. - Reconcile loop: 매 변경 시 watch + react. - Level-triggered (not edge). ## 💻 코드 패턴 ### 일반 K8s ```yaml # Deployment, Service, Pod, ConfigMap, ... apiVersion: apps/v1 kind: Deployment metadata: { name: app } spec: replicas: 3 template: ... ``` ### CRD (Custom Resource Definition) ```yaml apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: postgresclusters.acid.zalan.do spec: group: acid.zalan.do versions: - name: v1 served: true storage: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: version: { type: string } replicas: { type: integer } volumeSize: { type: string } scope: Namespaced names: plural: postgresclusters singular: postgrescluster kind: PostgresCluster ``` ### Custom Resource (CR) ```yaml apiVersion: acid.zalan.do/v1 kind: PostgresCluster metadata: name: my-db spec: version: "16" replicas: 3 volumeSize: 50Gi ``` → K8s resource 처럼. Operator 가 reconcile. ### Operator (Go) ```go // kubebuilder 가 generate type PostgresClusterReconciler struct { client.Client } func (r *PostgresClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { var cluster acidv1.PostgresCluster if err := r.Get(ctx, req.NamespacedName, &cluster); err != nil { return ctrl.Result{}, client.IgnoreNotFound(err) } // Desired state desired := buildStatefulSet(&cluster) // Actual var actual appsv1.StatefulSet err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, &actual) if errors.IsNotFound(err) { // Create return ctrl.Result{}, r.Create(ctx, desired) } // Update if drift if !equal(&actual, desired) { actual.Spec = desired.Spec return ctrl.Result{}, r.Update(ctx, &actual) } return ctrl.Result{RequeueAfter: time.Minute}, nil } ``` → 매 변경 시 actual 가 desired 와 같게. ### kubebuilder ```bash kubebuilder init --domain example.com kubebuilder create api --group apps --version v1alpha1 --kind App # → CRD + controller 생성 # Edit, build, deploy make manifests make install make run ``` ### operator-sdk ```bash operator-sdk init --domain example.com operator-sdk create api --group apps --version v1alpha1 --kind App ``` → kubebuilder 와 비슷. ### Reconcile pattern ```go func Reconcile(req) (Result, error) { obj := getObject(req) // Finalizer (deletion handling) if !obj.DeletionTimestamp.IsZero() { return handleDeletion(obj) } // Add finalizer if !containsString(obj.Finalizers, finalizer) { obj.Finalizers = append(obj.Finalizers, finalizer) return ctrl.Result{}, r.Update(ctx, obj) } // Reconcile children if err := reconcileService(obj); err != nil { ... } if err := reconcileDeployment(obj); err != nil { ... } if err := reconcileConfigMap(obj); err != nil { ... } // Update status obj.Status.Phase = "Running" return ctrl.Result{}, r.Status().Update(ctx, obj) } ``` ### Status subresource ```go type AppStatus struct { Phase string `json:"phase"` Replicas int32 `json:"replicas"` Conditions []Condition `json:"conditions"` } // Spec 변경 = user 의 desired. // Status 변경 = controller. ``` ### Owner reference (cleanup) ```go deployment.OwnerReferences = []metav1.OwnerReference{ {APIVersion: "apps.example.com/v1", Kind: "App", Name: app.Name, UID: app.UID, Controller: ptr(true)}, } ``` → App 삭제 = deployment 자동 삭제 (cascade). ### Webhook (validation) ```go func (r *App) ValidateCreate() error { if r.Spec.Replicas < 1 { return fmt.Errorf("replicas must be >= 1") } return nil } ``` → kubectl create 시 검증. ### Mutation webhook ```go func (r *App) Default() { if r.Spec.Image == "" { r.Spec.Image = "default:latest" } } ``` → Default 값 자동 채움. ### Watch ```go return ctrl.NewControllerManagedBy(mgr). For(&appsv1alpha1.App{}). Owns(&appsv1.Deployment{}). Owns(&corev1.Service{}). Complete(r) ``` → App / 자식 Deployment / Service 변경 시 reconcile. ### Real-world operators ``` - prometheus-operator: Prometheus 자동 deploy + config - cert-manager: TLS cert 자동 (Let's Encrypt) - postgres-operator (Zalando, Crunchy) - strimzi-kafka-operator - istio-operator - argocd-operator - velero (backup) - external-secrets-operator ``` ### Operator vs Helm ``` Helm: - Templating (1 deploy, then static) - 복잡 변경 = manual Operator: - Continuous reconcile - Self-healing - Domain-specific logic → Stateful (DB, message queue) = operator. Stateless app = Helm. ``` ### Helm + Operator 둘 다 ``` Operator 자체 가 Helm chart 로 install. - helm install postgres-operator ... - 그 후 user 가 PostgresCluster CR 만 작성. ``` ### Levels (capability) ``` Level 1: Basic install Level 2: Seamless upgrade Level 3: Full lifecycle (backup, restore) Level 4: Deep insights (metric, alert) Level 5: Auto pilot (auto-heal, auto-scale, auto-tune) ``` → Mature operator 가 Level 4-5. ### OperatorHub ``` operatorhub.io - Catalog of operator - 1-click install - OLM (Operator Lifecycle Manager) 가 관리 ``` ### Crossplane (operator 식 IaC) ```yaml apiVersion: database.aws.crossplane.io/v1beta1 kind: RDSInstance metadata: name: my-db spec: forProvider: region: us-east-1 dbInstanceClass: db.t3.micro engine: postgres masterUsername: admin ``` → AWS 의 resource 가 K8s CR. Terraform 의 alternative. ### KEDA (event-driven autoscaling) ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: kafka-consumer spec: scaleTargetRef: name: consumer-deployment minReplicaCount: 0 maxReplicaCount: 100 triggers: - type: kafka metadata: topic: my-topic consumerGroup: my-group lagThreshold: "10" ``` → Kafka lag → consumer scale. ### When 작성 own operator? ``` ✓ 도메인 특화 (자체 product 가 K8s native). ✓ 복잡 lifecycle (DB, message queue). ✓ Auto-heal / auto-scale. ✗ 단순 deploy (Helm 충분). ✗ Pre-built operator 가 충분 (cert-manager 등). ``` ### Test ```go // envtest (kubebuilder) func TestReconcile(t *testing.T) { env := envtest.Environment{...} cfg, _ := env.Start() defer env.Stop() // Create CR, check children } ``` ### Production tips ``` - Idempotent reconcile (재시도 OK). - Status.Conditions 가 user-friendly. - Finalizer 가 cleanup. - Owner reference 가 cascade. - Resource limits (operator 자체). - Leader election (HA). - Metric / log. ``` ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | Stateful complex (DB) | Operator | | 단순 deploy | Helm | | Cloud resource | Crossplane | | Event-driven scale | KEDA | | Off-the-shelf | OperatorHub | | Custom domain | Build own (kubebuilder) | | Backup / restore | Velero (operator) | ## ❌ 안티패턴 - **Reconcile 가 idempotent X**: state corruption. - **No finalizer**: cleanup 안 됨. - **Owner reference 없음**: orphan resource. - **Status 업데이트 안 함**: user 가 모름. - **Webhook fail = create block**: 위험 (HA). - **No leader election**: race. - **모든 거 operator**: simple = Helm. ## 🤖 LLM 활용 힌트 - Operator = CRD + controller (reconcile loop). - kubebuilder / operator-sdk 가 boilerplate. - Stateful workload (DB) 가 sweet spot. - Crossplane / KEDA 가 모던. ## 🔗 관련 문서 - [[DevOps_Kubernetes_Basics]] - [[DevOps_Helm_Deep]] - [[DevOps_ArgoCD_GitOps]]