--- id: mlops-model-registry title: MLOps — Model registry / MLflow / W&B / artifact category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [mlops, ml, vibe-coding] tech_stack: { language: "Python", applicable_to: ["AI", "Backend"] } applied_in: [] aliases: [MLOps, MLflow, W&B, Weights and Biases, model registry, model versioning, artifact] --- # MLOps Model Registry > ML model 도 version + deploy 필요. **MLflow / W&B / DVC / Vertex AI**. Train → register → stage → deploy → monitor. ## 📖 핵심 개념 - Model = code + data + hyperparam + weights. - Registry: version 관리. - Stage: dev / staging / prod. - Lineage: 어느 dataset 으로 train. ## 💻 코드 패턴 ### MLflow ```python import mlflow mlflow.set_tracking_uri('http://mlflow:5000') mlflow.set_experiment('user-churn') with mlflow.start_run() as run: mlflow.log_param('lr', 0.001) mlflow.log_param('batch_size', 32) model = train(...) mlflow.log_metric('val_loss', 0.12) mlflow.log_metric('val_acc', 0.87) mlflow.sklearn.log_model(model, 'model', registered_model_name='ChurnModel') ``` ### Model registry (MLflow) ```python from mlflow.tracking import MlflowClient client = MlflowClient() # Register mv = client.create_model_version( name='ChurnModel', source=f'runs:/{run_id}/model', run_id=run_id, ) # Promote client.transition_model_version_stage( name='ChurnModel', version=mv.version, stage='Production', ) # Load model = mlflow.sklearn.load_model('models:/ChurnModel/Production') ``` ### W&B ```python import wandb wandb.init(project='churn', config={'lr': 0.001}) for epoch in range(100): loss = train_step() wandb.log({'loss': loss, 'epoch': epoch}) # Save artifact art = wandb.Artifact('model', type='model') art.add_file('model.pkl') wandb.log_artifact(art) ``` → Hyperparam sweep + chart 가 강함. ### DVC (Data Version Control) ```bash # Code in git, data in DVC dvc init dvc remote add -d s3 s3://bucket/dvc dvc add data/train.csv git add data/train.csv.dvc .gitignore git commit -m 'add dataset' # Pipeline dvc run -n train \ -d data/train.csv \ -d train.py \ -o model.pkl \ python train.py ``` → Git + S3 에 큰 file 영향 없음. ### Reproducibility ```python # Seed import torch, numpy as np, random torch.manual_seed(42) np.random.seed(42) random.seed(42) # Lock # requirements.txt 에 정확 버전 torch==2.4.0 transformers==4.45.0 # Docker for env FROM pytorch/pytorch:2.4.0-cuda12-runtime ``` ### Experiment compare ```python # MLflow runs = mlflow.search_runs(experiment_ids=['1'], max_results=10, order_by=['metrics.val_acc DESC']) # W&B import wandb api = wandb.Api() runs = api.runs('user/churn') df = pd.DataFrame([{'lr': r.config['lr'], 'acc': r.summary['val_acc']} for r in runs]) ``` ### Model serving (MLflow) ```bash mlflow models serve -m models:/ChurnModel/Production --port 5001 # REST curl http://localhost:5001/invocations \ -H 'Content-Type: application/json' \ -d '{"inputs": [[1,2,3]]}' ``` ### BentoML (production serving) ```python import bentoml @bentoml.service class ChurnPredictor: model = bentoml.models.get('churn:latest') @bentoml.api def predict(self, features: list[float]) -> dict: return {'pred': self.model.predict([features])[0]} ``` ```bash bentoml build bentoml containerize churn:latest ``` → Docker + REST + gRPC 자동. ### Triton (NVIDIA inference) ``` - 다중 model - 다중 framework (TF, PyTorch, ONNX) - Dynamic batching - GPU 친화 ``` ### TorchServe ```bash torchserve --start --models my_model=model.mar curl http://localhost:8080/predictions/my_model -d @input.json ``` ### Vertex AI / SageMaker ```python # Vertex AI from google.cloud import aiplatform aiplatform.init(project='my-project') model = aiplatform.Model.upload( display_name='churn', artifact_uri='gs://bucket/model', serving_container_image_uri='gcr.io/.../tf-serving', ) endpoint = model.deploy(machine_type='n1-standard-4', min_replica_count=1) ``` → Managed. Auto-scale + monitoring. ### Feature store ```python # Feast from feast import FeatureStore store = FeatureStore(repo_path='.') # Online (low latency) features = store.get_online_features( features=['user:age', 'user:total_spent'], entity_rows=[{'user_id': 123}], ).to_dict() # Offline (training) df = store.get_historical_features( entity_df=entity_df, features=[...], ).to_df() ``` → Train / serve consistency. ### Data validation (Great Expectations / Deequ) ```python import great_expectations as ge df = ge.from_pandas(train_df) df.expect_column_values_to_be_between('age', 0, 120) df.expect_column_to_exist('user_id') result = df.validate() ``` → Train 전 / inference 전 schema check. ### Schema (Pydantic / Feast) ```python from pydantic import BaseModel class Features(BaseModel): age: int income: float region: str # API input → validate @app.post('/predict') def predict(input: Features): return {'pred': model.predict([input.dict().values()])[0]} ``` ### CI / CD for ML ```yaml # .github/workflows/train.yml on: [push] jobs: train: steps: - uses: actions/checkout@v4 - run: dvc pull - run: pip install -r requirements.txt - run: python train.py - run: dvc push # save artifacts - run: | if python compare.py; then mlflow promote ... fi ``` → Continuous training. ### Model card (documentation) ```markdown # Model Card: Churn Predictor v3.1 ## Intended use Predict user churn for SaaS billing dashboard. ## Training data - Source: 2025-01-01 - 2026-04-30 - Size: 1.2M users - Features: 23 ## Performance - Val accuracy: 0.87 - Val AUC: 0.91 - F1: 0.83 ## Limitations - Trained on US-only data - Cold-start (< 30 days) accuracy ↓ - 30%+ class imbalance ## Bias - ... ``` → Trust + governance. ### Prompt versioning (LLM as model) ```python # Promptfoo / LangSmith / Helicone prompts = { 'v1': 'Summarize: {text}', 'v2': 'Provide a 3-sentence summary: {text}', } # A/B test in prod prompt = prompts[user.bucket] ``` ### Golden dataset ```python # Test set 가 변경 X test_df = pd.read_parquet('s3://bucket/golden_test.parquet') acc = evaluate(model, test_df) assert acc > 0.85, 'regression' ``` → Regression check. ### Online + offline metrics ``` Offline (train): accuracy, AUC, F1 Online (prod): user-clicked, dwell time, conversion → Offline 가 거의 항상 ≠ online. A/B test 가 진실. ``` ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | Single team / experiment | MLflow | | Hyperparam sweep | W&B | | Data versioning | DVC | | Production serving | BentoML / Triton | | Cloud managed | Vertex / SageMaker | | Feature store | Feast / Tecton | | Validation | Great Expectations | | Docs | Model card | ## ❌ 안티패턴 - **No version**: 어느 model 가 prod? - **Train / serve drift**: feature 다르면 깨짐. - **No monitoring**: silent regression. - **Hyperparam in script**: 추적 X. - **Big artifact in git**: clone 폭발. - **No reproducibility**: seed 없음. - **Direct prod deploy**: staging 없음. ## 🤖 LLM 활용 힌트 - MLflow / W&B 가 baseline. - Feature store 가 train/serve consistency. - BentoML / Triton 가 production serving. - Model card = governance + trust. ## 🔗 관련 문서 - [[AI_Local_LLM_Inference]] - [[Data_Eng_dbt]] - [[DevOps_CI_CD_Pipeline_Patterns]]