Files
2nd/10_Wiki/Topics/Coding/DevOps_K8s_Operators.md
T
2026-05-10 22:08:15 +09:00

8.3 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
devops-k8s-operators Kubernetes Operators — CRD + Controller Coding draft B conceptual 2026-05-09 2026-05-09
devops
kubernetes
operator
vibe-coding
language applicable_to
Go / YAML
DevOps
K8s operator
CRD
custom resource
controller
kubebuilder
operator-sdk
reconcile

Kubernetes Operators

"Application 의 lifecycle 가 K8s native". CRD (Custom Resource) + Controller (reconcile loop). Postgres / Kafka / Redis 가 자체 operator. kubebuilder / operator-sdk.

📖 핵심 개념

  • CRD: 새 K8s resource type.
  • Controller: actual state → desired state.
  • Reconcile loop: 매 변경 시 watch + react.
  • Level-triggered (not edge).

💻 코드 패턴

일반 K8s

# Deployment, Service, Pod, ConfigMap, ...
apiVersion: apps/v1
kind: Deployment
metadata: { name: app }
spec:
  replicas: 3
  template: ...

CRD (Custom Resource Definition)

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: postgresclusters.acid.zalan.do
spec:
  group: acid.zalan.do
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                version: { type: string }
                replicas: { type: integer }
                volumeSize: { type: string }
  scope: Namespaced
  names:
    plural: postgresclusters
    singular: postgrescluster
    kind: PostgresCluster

Custom Resource (CR)

apiVersion: acid.zalan.do/v1
kind: PostgresCluster
metadata:
  name: my-db
spec:
  version: "16"
  replicas: 3
  volumeSize: 50Gi

→ K8s resource 처럼. Operator 가 reconcile.

Operator (Go)

// kubebuilder 가 generate
type PostgresClusterReconciler struct {
    client.Client
}

func (r *PostgresClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var cluster acidv1.PostgresCluster
    if err := r.Get(ctx, req.NamespacedName, &cluster); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    // Desired state
    desired := buildStatefulSet(&cluster)
    
    // Actual
    var actual appsv1.StatefulSet
    err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, &actual)
    
    if errors.IsNotFound(err) {
        // Create
        return ctrl.Result{}, r.Create(ctx, desired)
    }
    
    // Update if drift
    if !equal(&actual, desired) {
        actual.Spec = desired.Spec
        return ctrl.Result{}, r.Update(ctx, &actual)
    }
    
    return ctrl.Result{RequeueAfter: time.Minute}, nil
}

→ 매 변경 시 actual 가 desired 와 같게.

kubebuilder

kubebuilder init --domain example.com
kubebuilder create api --group apps --version v1alpha1 --kind App
# → CRD + controller 생성

# Edit, build, deploy
make manifests
make install
make run

operator-sdk

operator-sdk init --domain example.com
operator-sdk create api --group apps --version v1alpha1 --kind App

→ kubebuilder 와 비슷.

Reconcile pattern

func Reconcile(req) (Result, error) {
    obj := getObject(req)
    
    // Finalizer (deletion handling)
    if !obj.DeletionTimestamp.IsZero() {
        return handleDeletion(obj)
    }
    
    // Add finalizer
    if !containsString(obj.Finalizers, finalizer) {
        obj.Finalizers = append(obj.Finalizers, finalizer)
        return ctrl.Result{}, r.Update(ctx, obj)
    }
    
    // Reconcile children
    if err := reconcileService(obj); err != nil { ... }
    if err := reconcileDeployment(obj); err != nil { ... }
    if err := reconcileConfigMap(obj); err != nil { ... }
    
    // Update status
    obj.Status.Phase = "Running"
    return ctrl.Result{}, r.Status().Update(ctx, obj)
}

Status subresource

type AppStatus struct {
    Phase string `json:"phase"`
    Replicas int32 `json:"replicas"`
    Conditions []Condition `json:"conditions"`
}

// Spec 변경 = user 의 desired.
// Status 변경 = controller.

Owner reference (cleanup)

deployment.OwnerReferences = []metav1.OwnerReference{
    {APIVersion: "apps.example.com/v1", Kind: "App", Name: app.Name, UID: app.UID, Controller: ptr(true)},
}

→ App 삭제 = deployment 자동 삭제 (cascade).

Webhook (validation)

func (r *App) ValidateCreate() error {
    if r.Spec.Replicas < 1 {
        return fmt.Errorf("replicas must be >= 1")
    }
    return nil
}

→ kubectl create 시 검증.

Mutation webhook

func (r *App) Default() {
    if r.Spec.Image == "" {
        r.Spec.Image = "default:latest"
    }
}

→ Default 값 자동 채움.

Watch

return ctrl.NewControllerManagedBy(mgr).
    For(&appsv1alpha1.App{}).
    Owns(&appsv1.Deployment{}).
    Owns(&corev1.Service{}).
    Complete(r)

→ App / 자식 Deployment / Service 변경 시 reconcile.

Real-world operators

- prometheus-operator: Prometheus 자동 deploy + config
- cert-manager: TLS cert 자동 (Let's Encrypt)
- postgres-operator (Zalando, Crunchy)
- strimzi-kafka-operator
- istio-operator
- argocd-operator
- velero (backup)
- external-secrets-operator

Operator vs Helm

Helm:
- Templating (1 deploy, then static)
- 복잡 변경 = manual

Operator:
- Continuous reconcile
- Self-healing
- Domain-specific logic

→ Stateful (DB, message queue) = operator.
Stateless app = Helm.

Helm + Operator 둘 다

Operator 자체 가 Helm chart 로 install.
- helm install postgres-operator ...
- 그 후 user 가 PostgresCluster CR 만 작성.

Levels (capability)

Level 1: Basic install
Level 2: Seamless upgrade
Level 3: Full lifecycle (backup, restore)
Level 4: Deep insights (metric, alert)
Level 5: Auto pilot (auto-heal, auto-scale, auto-tune)

→ Mature operator 가 Level 4-5.

OperatorHub

operatorhub.io
- Catalog of operator
- 1-click install
- OLM (Operator Lifecycle Manager) 가 관리

Crossplane (operator 식 IaC)

apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
  name: my-db
spec:
  forProvider:
    region: us-east-1
    dbInstanceClass: db.t3.micro
    engine: postgres
    masterUsername: admin

→ AWS 의 resource 가 K8s CR. Terraform 의 alternative.

KEDA (event-driven autoscaling)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer
spec:
  scaleTargetRef:
    name: consumer-deployment
  minReplicaCount: 0
  maxReplicaCount: 100
  triggers:
    - type: kafka
      metadata:
        topic: my-topic
        consumerGroup: my-group
        lagThreshold: "10"

→ Kafka lag → consumer scale.

When 작성 own operator?

✓ 도메인 특화 (자체 product 가 K8s native).
✓ 복잡 lifecycle (DB, message queue).
✓ Auto-heal / auto-scale.

✗ 단순 deploy (Helm 충분).
✗ Pre-built operator 가 충분 (cert-manager 등).

Test

// envtest (kubebuilder)
func TestReconcile(t *testing.T) {
    env := envtest.Environment{...}
    cfg, _ := env.Start()
    defer env.Stop()
    
    // Create CR, check children
}

Production tips

- Idempotent reconcile (재시도 OK).
- Status.Conditions 가 user-friendly.
- Finalizer 가 cleanup.
- Owner reference 가 cascade.
- Resource limits (operator 자체).
- Leader election (HA).
- Metric / log.

🤔 의사결정 기준

상황 추천
Stateful complex (DB) Operator
단순 deploy Helm
Cloud resource Crossplane
Event-driven scale KEDA
Off-the-shelf OperatorHub
Custom domain Build own (kubebuilder)
Backup / restore Velero (operator)

안티패턴

  • Reconcile 가 idempotent X: state corruption.
  • No finalizer: cleanup 안 됨.
  • Owner reference 없음: orphan resource.
  • Status 업데이트 안 함: user 가 모름.
  • Webhook fail = create block: 위험 (HA).
  • No leader election: race.
  • 모든 거 operator: simple = Helm.

🤖 LLM 활용 힌트

  • Operator = CRD + controller (reconcile loop).
  • kubebuilder / operator-sdk 가 boilerplate.
  • Stateful workload (DB) 가 sweet spot.
  • Crossplane / KEDA 가 모던.

🔗 관련 문서