MLOps Workshop

๐Ÿ”„

CI/CD for ML Pipelines

Automate your entire ML pipeline from code to production with self-updating, self-healing workflows

โฑ๏ธ 8 minutes ๐Ÿ“š Phase 6 of 7 ๐Ÿ› ๏ธ Hands-on

๐ŸŽฏ Objective

Set up a CI/CD workflow that automatically tests, retrains, deploys, and rolls back your ML models.

๐Ÿงช

Test

Data and code quality

๐Ÿ”„

Retrain

Version-control models

๐Ÿš€

Deploy

To Kubeflow & KServe

โ†ฉ๏ธ

Rollback

On performance drop

โš™๏ธ What is CI/CD for ML?

CI/CD for ML extends DevOps automation to data, models, and experiments. It ensures every change is tested, validated, and safely deployed.

๐Ÿš€ Continuous Integration (CI)

Automatically runs on every git push or pull request:

  • โœ… Code quality & lint checks
  • โœ… Unit + integration + data validation tests
  • โœ… Performance benchmarking

Example:

When a new feature or data schema changes โ€” CI ensures that the pipeline still works and the model accuracy stays above thresholds.

๐Ÿ” Continuous Deployment (CD)

Once all tests pass:

  • โœ… Retrain the model
  • โœ… Store new model in MinIO
  • โœ… Trigger Kubeflow deployment
  • โœ… Rollback automatically on failure

โš ๏ธ ML-Specific CI/CD Challenges

ML systems have unique challenges that traditional CI/CD doesn't address.

๐Ÿ“Š Data Validation

Why it matters: Data can drift silently and break models

Solution: Add schema and drift tests in CI

๐Ÿงช Model Testing

Why it matters: You need accuracy, fairness, robustness checks

Solution: Automated pytest-based ML tests

๐Ÿ”„ Model Retraining

Why it matters: Models degrade over time

Solution: Automate retraining via Kubeflow Pipelines

โ†ฉ๏ธ Rollback

Why it matters: Bad models harm users fast

Solution: Use KServe versioning + traffic split rollback

๐Ÿ› ๏ธ

Step 1: Create GitHub Actions Workflow

Build a real workflow using GitHub Actions + Kubeflow Pipelines + KServe.

๐Ÿ“„ GitHub Actions Workflow

Create .github/workflows/mlops-ci-cd.yml:

.github/workflows/mlops-ci-cd.yml
    
      name: MLOps CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  # 1๏ธโƒฃ Run validation & unit tests
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: actions/setup-python@v4
      with:
        python-version: "3.9"

    - name: Install dependencies
      run: pip install -r requirements.txt pytest

    - name: Run data validation tests
      run: pytest tests/test_data_validation.py -v

    - name: Run model unit tests
      run: pytest tests/test_model.py -v

  # 2๏ธโƒฃ Train & package model
  train:
    needs: test
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: actions/setup-python@v4
      with:
        python-version: "3.9"
    - name: Install dependencies
      run: pip install -r requirements.txt joblib scikit-learn

    - name: Train model
      run: python train_model.py

    - name: Upload model artifact
      uses: actions/upload-artifact@v3
      with:
        name: house-price-model
        path: model.pkl

  # 3๏ธโƒฃ Deploy to Kubeflow via KServe
  deploy:
    needs: [train]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
    - uses: actions/checkout@v3
    - uses: actions/download-artifact@v3
      with:
        name: house-price-model
        path: ./model

    - name: Configure kubectl
      run: |
        mkdir -p ~/.kube
        echo "$KUBECONFIG_CONTENT" > ~/.kube/config
        kubectl config use-context kubeflow-context

    - name: Upload model to MinIO
      run: |
        mc alias set minio http://minio.kubeflow.svc.cluster.local:9000 minio minio123
        mc cp ./model/model.pkl minio/mlpipeline/models/house-price/latest/

    - name: Deploy model to KServe
      run: kubectl apply -f k8s/model-inference.yaml -n kubeflow
    
  
๐Ÿงช

Step 2: Add Automated Tests

Create comprehensive tests for data validation and model performance.

๐Ÿ“Š Data Validation Tests

Create tests/test_data_validation.py:

tests/test_data_validation.py
    
      import pandas as pd, pytest, numpy as np

def test_no_missing_values():
    df = pd.read_csv("data/house_prices.csv")
    assert not df.isnull().any().any(), "Missing values found!"

def test_valid_ranges():
    df = pd.read_csv("data/house_prices.csv")
    assert (df['size'] > 0).all()
    assert df['price'].between(10000, 10000000).all()
    
  

๐Ÿค– Model Performance Tests

Create tests/test_model.py:

tests/test_model.py
    
      from sklearn.metrics import r2_score, mean_squared_error
from model import HousePriceModel
import numpy as np

def test_model_performance():
    model = HousePriceModel()
    X = np.array([[2000, 3, 5], [1500, 2, 10]])
    y = np.array([250000, 180000])
    preds = model.predict(X)
    assert mean_squared_error(y, preds) < 10000
    assert r2_score(y, preds) > 0.85
    
  
๐Ÿš€

Step 3: Create KServe Deployment Config

Configure KServe to automatically deploy your models from MinIO storage.

๐Ÿ“„ KServe Deployment Configuration

Create k8s/model-inference.yaml:

k8s/model-inference.yaml
    
      apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: house-price-model
  namespace: kubeflow
spec:
  predictor:
    serviceAccountName: minio-sa
    model:
      modelFormat:
        name: sklearn
      storageUri: s3://mlpipeline/models/house-price/latest/
    containers:
    - image: your-registry/house-price-model:latest
      ports:
      - containerPort: 8080
      env:
      - name: AWS_ENDPOINT_URL
        value: http://minio.kubeflow.svc.cluster.local:9000
      - name: AWS_ACCESS_KEY_ID
        value: minio
      - name: AWS_SECRET_ACCESS_KEY
        value: minio123
    
  
๐Ÿ”„

Step 4: Auto-Retrain on Drift

Add drift detection to automatically trigger model retraining when performance degrades.

๐Ÿค– Drift Detection Component

In your Kubeflow Pipeline, add a component that monitors model accuracy and retriggers training:

Drift Detection Component
    
      from kfp import dsl

@dsl.component
def monitor_drift(threshold: float = 0.1):
    import requests
    current_accuracy = requests.get("http://monitor/api/accuracy").json()["value"]
    if current_accuracy < threshold:
        print("๐Ÿ”„ Drift detected โ€” triggering retraining...")
        # trigger pipeline
    
  

Implementation: Schedule this component to run daily with ScheduledWorkflow to continuously monitor model performance and trigger retraining when needed.

โ†ฉ๏ธ

Step 5: Auto-Rollback (Safe Deployments)

Implement safe deployment strategies with automatic rollback capabilities.

๐Ÿ›ก๏ธ Canary + A/B Deployments

KServe supports Canary + A/B Deployments for safe model updates:

Canary Deployment Configuration
    
      spec:
  predictor:
    canaryTrafficPercent: 10
    canary:
      model:
        storageUri: s3://mlpipeline/models/house-price/v2/
    
  

How it works: If metrics degrade, KServe automatically routes traffic back to the stable version, ensuring your users always get the best performing model.

๐Ÿงฉ CI/CD Best Practices for ML

Follow these guidelines to build robust, automated ML pipelines.

Testing & Quality

  • ๐Ÿงช Unit, integration, and data validation tests
  • ๐Ÿงฌ Tag every code + model + data change
  • ๐Ÿ“Š Track latency, drift, and accuracy post-deployment

Deployment & Safety

  • โ†ฉ๏ธ Keep N-1 version ready and auto-rollback
  • ๐Ÿ’ก Use GitHub Actions + Kubeflow to trigger retraining
  • ๐Ÿ”„ Implement gradual traffic shifting

๐ŸŽ‰ What You've Built

You now have a complete, automated ML CI/CD pipeline that handles everything from code changes to production deployment.

โœ… End-to-end CI/CD automation
โœ… Auto-test โ†’ retrain โ†’ deploy โ†’ monitor loop
โœ… Model versioning in MinIO
โœ… GitOps-based KServe deployment
โœ… Automatic rollback + drift retraining
โœ… Production-ready safety measures