🔄

CI/CD for ML Pipelines

Automate your entire ML pipeline from code to production with self-updating, self-healing workflows

⏱️ 8 minutes 📚 Phase 6 of 7 🛠️ Hands-on

🎯 Objective

Set up a CI/CD workflow that automatically tests, retrains, deploys, and rolls back your ML models.

🧪

Test

Data and code quality

🔄

Retrain

Version-control models

🚀

Deploy

To Kubeflow & KServe

↩️

Rollback

On performance drop

⚙️ What is CI/CD for ML?

CI/CD for ML extends DevOps automation to data, models, and experiments. It ensures every change is tested, validated, and safely deployed.

🚀 Continuous Integration (CI)

Automatically runs on every git push or pull request:

✅ Code quality & lint checks
✅ Unit + integration + data validation tests
✅ Performance benchmarking

Example:

When a new feature or data schema changes — CI ensures that the pipeline still works and the model accuracy stays above thresholds.

🔁 Continuous Deployment (CD)

Once all tests pass:

✅ Retrain the model
✅ Store new model in MinIO
✅ Trigger Kubeflow deployment
✅ Rollback automatically on failure

⚠️ ML-Specific CI/CD Challenges

ML systems have unique challenges that traditional CI/CD doesn't address.

📊 Data Validation

Why it matters: Data can drift silently and break models

Solution: Add schema and drift tests in CI

🧪 Model Testing

Why it matters: You need accuracy, fairness, robustness checks

Solution: Automated pytest-based ML tests

🔄 Model Retraining

Why it matters: Models degrade over time

Solution: Automate retraining via Kubeflow Pipelines

↩️ Rollback

Why it matters: Bad models harm users fast

Solution: Use KServe versioning + traffic split rollback

🛠️

Step 1: Create GitHub Actions Workflow

Build a real workflow using GitHub Actions + Kubeflow Pipelines + KServe.

📄 GitHub Actions Workflow

Create .github/workflows/mlops-ci-cd.yml:

.github/workflows/mlops-ci-cd.yml

    
      name: MLOps CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  # 1️⃣ Run validation & unit tests
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: actions/setup-python@v4
      with:
        python-version: "3.9"

    - name: Install dependencies
      run: pip install -r requirements.txt pytest

    - name: Run data validation tests
      run: pytest tests/test_data_validation.py -v

    - name: Run model unit tests
      run: pytest tests/test_model.py -v

  # 2️⃣ Train & package model
  train:
    needs: test
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: actions/setup-python@v4
      with:
        python-version: "3.9"
    - name: Install dependencies
      run: pip install -r requirements.txt joblib scikit-learn

    - name: Train model
      run: python train_model.py

    - name: Upload model artifact
      uses: actions/upload-artifact@v3
      with:
        name: house-price-model
        path: model.pkl

  # 3️⃣ Deploy to Kubeflow via KServe
  deploy:
    needs: [train]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
    - uses: actions/checkout@v3
    - uses: actions/download-artifact@v3
      with:
        name: house-price-model
        path: ./model

    - name: Configure kubectl
      run: |
        mkdir -p ~/.kube
        echo "$KUBECONFIG_CONTENT" > ~/.kube/config
        kubectl config use-context kubeflow-context

    - name: Upload model to MinIO
      run: |
        mc alias set minio http://minio.kubeflow.svc.cluster.local:9000 minio minio123
        mc cp ./model/model.pkl minio/mlpipeline/models/house-price/latest/

    - name: Deploy model to KServe
      run: kubectl apply -f k8s/model-inference.yaml -n kubeflow

🧪

Step 2: Add Automated Tests

Create comprehensive tests for data validation and model performance.

📊 Data Validation Tests

Create tests/test_data_validation.py:

tests/test_data_validation.py

    
      import pandas as pd, pytest, numpy as np

def test_no_missing_values():
    df = pd.read_csv("data/house_prices.csv")
    assert not df.isnull().any().any(), "Missing values found!"

def test_valid_ranges():
    df = pd.read_csv("data/house_prices.csv")
    assert (df['size'] > 0).all()
    assert df['price'].between(10000, 10000000).all()

🤖 Model Performance Tests

Create tests/test_model.py:

tests/test_model.py

    
      from sklearn.metrics import r2_score, mean_squared_error
from model import HousePriceModel
import numpy as np

def test_model_performance():
    model = HousePriceModel()
    X = np.array([[2000, 3, 5], [1500, 2, 10]])
    y = np.array([250000, 180000])
    preds = model.predict(X)
    assert mean_squared_error(y, preds) < 10000
    assert r2_score(y, preds) > 0.85

🚀

Step 3: Create KServe Deployment Config

Configure KServe to automatically deploy your models from MinIO storage.

📄 KServe Deployment Configuration

Create k8s/model-inference.yaml:

k8s/model-inference.yaml

    
      apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: house-price-model
  namespace: kubeflow
spec:
  predictor:
    serviceAccountName: minio-sa
    model:
      modelFormat:
        name: sklearn
      storageUri: s3://mlpipeline/models/house-price/latest/
    containers:
    - image: your-registry/house-price-model:latest
      ports:
      - containerPort: 8080
      env:
      - name: AWS_ENDPOINT_URL
        value: http://minio.kubeflow.svc.cluster.local:9000
      - name: AWS_ACCESS_KEY_ID
        value: minio
      - name: AWS_SECRET_ACCESS_KEY
        value: minio123

🔄

Step 4: Auto-Retrain on Drift

Add drift detection to automatically trigger model retraining when performance degrades.

🤖 Drift Detection Component

In your Kubeflow Pipeline, add a component that monitors model accuracy and retriggers training:

Drift Detection Component

    
      from kfp import dsl

@dsl.component
def monitor_drift(threshold: float = 0.1):
    import requests
    current_accuracy = requests.get("http://monitor/api/accuracy").json()["value"]
    if current_accuracy < threshold:
        print("🔄 Drift detected — triggering retraining...")
        # trigger pipeline

Implementation: Schedule this component to run daily with ScheduledWorkflow to continuously monitor model performance and trigger retraining when needed.

↩️

Step 5: Auto-Rollback (Safe Deployments)

Implement safe deployment strategies with automatic rollback capabilities.

🛡️ Canary + A/B Deployments

KServe supports Canary + A/B Deployments for safe model updates:

Canary Deployment Configuration

    
      spec:
  predictor:
    canaryTrafficPercent: 10
    canary:
      model:
        storageUri: s3://mlpipeline/models/house-price/v2/

How it works: If metrics degrade, KServe automatically routes traffic back to the stable version, ensuring your users always get the best performing model.

🧩 CI/CD Best Practices for ML

Follow these guidelines to build robust, automated ML pipelines.

Testing & Quality

🧪 Unit, integration, and data validation tests
🧬 Tag every code + model + data change
📊 Track latency, drift, and accuracy post-deployment

Deployment & Safety

↩️ Keep N-1 version ready and auto-rollback
💡 Use GitHub Actions + Kubeflow to trigger retraining
🔄 Implement gradual traffic shifting

🎉 What You've Built

You now have a complete, automated ML CI/CD pipeline that handles everything from code changes to production deployment.

✅ End-to-end CI/CD automation

✅ Auto-test → retrain → deploy → monitor loop

✅ Model versioning in MinIO

✅ GitOps-based KServe deployment

✅ Automatic rollback + drift retraining

✅ Production-ready safety measures