Files
2nd/10_Wiki/Topics/Coding/MLOps_Model_Registry.md
T
2026-05-09 22:47:42 +09:00

7.4 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
mlops-model-registry MLOps — Model registry / MLflow / W&B / artifact Coding draft B conceptual 2026-05-09 2026-05-09
mlops
ml
vibe-coding
language applicable_to
Python
AI
Backend
MLOps
MLflow
W&B
Weights and Biases
model registry
model versioning
artifact

MLOps Model Registry

ML model 도 version + deploy 필요. MLflow / W&B / DVC / Vertex AI. Train → register → stage → deploy → monitor.

📖 핵심 개념

  • Model = code + data + hyperparam + weights.
  • Registry: version 관리.
  • Stage: dev / staging / prod.
  • Lineage: 어느 dataset 으로 train.

💻 코드 패턴

MLflow

import mlflow

mlflow.set_tracking_uri('http://mlflow:5000')
mlflow.set_experiment('user-churn')

with mlflow.start_run() as run:
    mlflow.log_param('lr', 0.001)
    mlflow.log_param('batch_size', 32)
    
    model = train(...)
    
    mlflow.log_metric('val_loss', 0.12)
    mlflow.log_metric('val_acc', 0.87)
    
    mlflow.sklearn.log_model(model, 'model', registered_model_name='ChurnModel')

Model registry (MLflow)

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Register
mv = client.create_model_version(
    name='ChurnModel',
    source=f'runs:/{run_id}/model',
    run_id=run_id,
)

# Promote
client.transition_model_version_stage(
    name='ChurnModel',
    version=mv.version,
    stage='Production',
)

# Load
model = mlflow.sklearn.load_model('models:/ChurnModel/Production')

W&B

import wandb

wandb.init(project='churn', config={'lr': 0.001})
for epoch in range(100):
    loss = train_step()
    wandb.log({'loss': loss, 'epoch': epoch})

# Save artifact
art = wandb.Artifact('model', type='model')
art.add_file('model.pkl')
wandb.log_artifact(art)

→ Hyperparam sweep + chart 가 강함.

DVC (Data Version Control)

# Code in git, data in DVC
dvc init
dvc remote add -d s3 s3://bucket/dvc

dvc add data/train.csv
git add data/train.csv.dvc .gitignore
git commit -m 'add dataset'

# Pipeline
dvc run -n train \
    -d data/train.csv \
    -d train.py \
    -o model.pkl \
    python train.py

→ Git + S3 에 큰 file 영향 없음.

Reproducibility

# Seed
import torch, numpy as np, random
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# Lock
# requirements.txt 에 정확 버전
torch==2.4.0
transformers==4.45.0

# Docker for env
FROM pytorch/pytorch:2.4.0-cuda12-runtime

Experiment compare

# MLflow
runs = mlflow.search_runs(experiment_ids=['1'], max_results=10, order_by=['metrics.val_acc DESC'])

# W&B
import wandb
api = wandb.Api()
runs = api.runs('user/churn')
df = pd.DataFrame([{'lr': r.config['lr'], 'acc': r.summary['val_acc']} for r in runs])

Model serving (MLflow)

mlflow models serve -m models:/ChurnModel/Production --port 5001

# REST
curl http://localhost:5001/invocations \
  -H 'Content-Type: application/json' \
  -d '{"inputs": [[1,2,3]]}'

BentoML (production serving)

import bentoml

@bentoml.service
class ChurnPredictor:
    model = bentoml.models.get('churn:latest')
    
    @bentoml.api
    def predict(self, features: list[float]) -> dict:
        return {'pred': self.model.predict([features])[0]}
bentoml build
bentoml containerize churn:latest

→ Docker + REST + gRPC 자동.

Triton (NVIDIA inference)

- 다중 model
- 다중 framework (TF, PyTorch, ONNX)
- Dynamic batching
- GPU 친화

TorchServe

torchserve --start --models my_model=model.mar
curl http://localhost:8080/predictions/my_model -d @input.json

Vertex AI / SageMaker

# Vertex AI
from google.cloud import aiplatform

aiplatform.init(project='my-project')
model = aiplatform.Model.upload(
    display_name='churn',
    artifact_uri='gs://bucket/model',
    serving_container_image_uri='gcr.io/.../tf-serving',
)
endpoint = model.deploy(machine_type='n1-standard-4', min_replica_count=1)

→ Managed. Auto-scale + monitoring.

Feature store

# Feast
from feast import FeatureStore
store = FeatureStore(repo_path='.')

# Online (low latency)
features = store.get_online_features(
    features=['user:age', 'user:total_spent'],
    entity_rows=[{'user_id': 123}],
).to_dict()

# Offline (training)
df = store.get_historical_features(
    entity_df=entity_df,
    features=[...],
).to_df()

→ Train / serve consistency.

Data validation (Great Expectations / Deequ)

import great_expectations as ge

df = ge.from_pandas(train_df)
df.expect_column_values_to_be_between('age', 0, 120)
df.expect_column_to_exist('user_id')
result = df.validate()

→ Train 전 / inference 전 schema check.

Schema (Pydantic / Feast)

from pydantic import BaseModel

class Features(BaseModel):
    age: int
    income: float
    region: str

# API input → validate
@app.post('/predict')
def predict(input: Features):
    return {'pred': model.predict([input.dict().values()])[0]}

CI / CD for ML

# .github/workflows/train.yml
on: [push]
jobs:
  train:
    steps:
      - uses: actions/checkout@v4
      - run: dvc pull
      - run: pip install -r requirements.txt
      - run: python train.py
      - run: dvc push  # save artifacts
      - run: |
          if python compare.py; then
            mlflow promote ...
          fi

→ Continuous training.

Model card (documentation)

# Model Card: Churn Predictor v3.1

## Intended use
Predict user churn for SaaS billing dashboard.

## Training data
- Source: 2025-01-01 - 2026-04-30
- Size: 1.2M users
- Features: 23

## Performance
- Val accuracy: 0.87
- Val AUC: 0.91
- F1: 0.83

## Limitations
- Trained on US-only data
- Cold-start (< 30 days) accuracy ↓
- 30%+ class imbalance

## Bias
- ...

→ Trust + governance.

Prompt versioning (LLM as model)

# Promptfoo / LangSmith / Helicone
prompts = {
  'v1': 'Summarize: {text}',
  'v2': 'Provide a 3-sentence summary: {text}',
}

# A/B test in prod
prompt = prompts[user.bucket]

Golden dataset

# Test set 가 변경 X
test_df = pd.read_parquet('s3://bucket/golden_test.parquet')
acc = evaluate(model, test_df)
assert acc > 0.85, 'regression'

→ Regression check.

Online + offline metrics

Offline (train): accuracy, AUC, F1
Online (prod): user-clicked, dwell time, conversion

→ Offline 가 거의 항상 ≠ online.
A/B test 가 진실.

🤔 의사결정 기준

상황 추천
Single team / experiment MLflow
Hyperparam sweep W&B
Data versioning DVC
Production serving BentoML / Triton
Cloud managed Vertex / SageMaker
Feature store Feast / Tecton
Validation Great Expectations
Docs Model card

안티패턴

  • No version: 어느 model 가 prod?
  • Train / serve drift: feature 다르면 깨짐.
  • No monitoring: silent regression.
  • Hyperparam in script: 추적 X.
  • Big artifact in git: clone 폭발.
  • No reproducibility: seed 없음.
  • Direct prod deploy: staging 없음.

🤖 LLM 활용 힌트

  • MLflow / W&B 가 baseline.
  • Feature store 가 train/serve consistency.
  • BentoML / Triton 가 production serving.
  • Model card = governance + trust.

🔗 관련 문서