MLOps Workshop

📊

Model Monitoring & Observability

Monitor your ML models in production for performance, accuracy, and health with Prometheus + Grafana

âąī¸ 10 minutes 📚 Phase 5 of 7 đŸ› ī¸ Hands-on

đŸŽ¯ Objective

Set up end-to-end monitoring for your deployed ML model to track performance, drift, and system health.

⚡

Performance

Latency & accuracy tracking

🔄

Drift Detection

Data & concept drift monitoring

💚

System Health

Resource usage & alerts

🔍 Why Monitor ML Models?

Unlike traditional code, ML models degrade silently when real-world data drifts. Monitoring catches it early — before customers do.

🧩 Data Drift

Input data distribution changes

Example:

House prices rise after 2020 → old model underpredicts

🔄 Concept Drift

Feature → output relationship shifts

Example:

People prefer home offices after COVID → new weight for "rooms"

⚡ Performance Drop

Accuracy/latency degrades

Example:

Model accuracy falls from 95 → 80% in 6 months

📈 What to Monitor

Comprehensive monitoring covers model performance, data quality, system health, and business metrics.

đŸŽ¯ Model Performance

  • â€ĸ Accuracy / RMSE / MAE
  • â€ĸ Prediction distribution
  • â€ĸ Response latency

📊 Data Quality

  • â€ĸ Missing values or schema mismatch
  • â€ĸ Value ranges outside training bounds
  • â€ĸ Data drift detection

âš™ī¸ System Health

  • â€ĸ CPU, memory, and pod restarts
  • â€ĸ Request throughput & error rate
  • â€ĸ Resource utilization

💰 Business Metrics

  • â€ĸ Prediction volume per day
  • â€ĸ User satisfaction / feedback loop
  • â€ĸ Cost per prediction
đŸ› ī¸

Step 1: Install Prometheus + Grafana

Use Helm to set up a complete monitoring stack with Prometheus, Grafana, and ServiceMonitors.

đŸ“Ļ Install Monitoring Stack

Install Prometheus + Grafana
    
      # Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install monitoring stack
helm install prometheus prometheus-community/kube-prometheus-stack   --namespace monitoring   --create-namespace   --set grafana.adminPassword=admin123

# Verify pods
kubectl get pods -n monitoring
    
  

This deploys:

  • â€ĸ Prometheus (Server + AlertManager + Node Exporter)
  • â€ĸ Grafana (for dashboards)
  • â€ĸ ServiceMonitors for KServe pods
📊

Step 2: Add Custom Metrics to Your Model

Update your model server to emit Prometheus metrics for comprehensive monitoring.

🐍 Monitored Model Server

Update your model_server.py with Prometheus metrics:

monitored_model_server.py
    
      import joblib, numpy as np, time
from typing import Dict, Any
from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Custom metrics
PREDICTION_COUNTER = Counter('model_predictions_total', 'Total predictions')
PREDICTION_ERRORS  = Counter('model_prediction_errors_total', 'Total errors')
PREDICTION_LATENCY = Histogram('model_prediction_latency_seconds', 'Prediction latency')
MODEL_ACCURACY     = Gauge('model_accuracy', 'Model accuracy')

class MonitoredHousePriceModel:
    def __init__(self):
        self.model = joblib.load('/mnt/models/model.pkl')
        start_http_server(8000)  # Expose metrics endpoint
        MODEL_ACCURACY.set(0.93) # Current model accuracy baseline

    def predict(self, request: Dict[str, Any]) -> Dict[str, Any]:
        start = time.time()
        try:
            size = request['size']; bedrooms = request['bedrooms']; age = request['age']
            X = np.array([[size, bedrooms, age]])
            pred = float(self.model.predict(X)[0])
            PREDICTION_COUNTER.inc()
            PREDICTION_LATENCY.observe(time.time() - start)
            return {'predicted_price': pred}
        except Exception:
            PREDICTION_ERRORS.inc()
            raise

model = MonitoredHousePriceModel()
    
  
â„šī¸ Next Steps

After updating your code:

  • â€ĸ Re-build your Docker image
  • â€ĸ Re-deploy your InferenceService
  • â€ĸ Prometheus will auto-scrape metrics on port 8000
📈

Step 3: Access Grafana

Set up port-forwarding and access the Grafana dashboard to create monitoring panels.

🔗 Access Grafana Dashboard

Port-forward to Grafana
    
      # Port-forward to Grafana
kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80

# Then open:
# 👉 http://localhost:3000

# Login credentials:
# Username: admin
# Password: admin123
    
  
📊

Step 4: Create Dashboard Panels

Add monitoring panels to track key metrics and visualize model performance.

📈 Dashboard Queries

Metric PromQL Panel Type
Prediction Rate rate(model_predictions_total[1m]) Time series
Avg Latency histogram_quantile(0.95, rate(model_prediction_latency_seconds_bucket[5m])) Gauge
Error Rate rate(model_prediction_errors_total[5m]) / rate(model_predictions_total[5m]) Bar graph
Model Accuracy model_accuracy Single stat
CPU/Memory container_cpu_usage_seconds_total / container_memory_usage_bytes Resource panel
🚨

Step 5: Add Alert Rules

Configure Prometheus alerts to notify you when model performance degrades or issues occur.

📄 Alert Rules Configuration

Save this as alerts.yaml:

alerts.yaml
    
      groups:
- name: model-monitoring
  rules:
  - alert: HighErrorRate
    expr: rate(model_prediction_errors_total[5m]) > 0.1
    for: 2m
    labels: { severity: warning }
    annotations:
      summary: "High error rate detected"
      description: "Model error rate = {{ $value }} errors/sec"

  - alert: HighLatency
    expr: histogram_quantile(0.95, rate(model_prediction_latency_seconds_bucket[5m])) > 1
    for: 2m
    labels: { severity: warning }
    annotations:
      summary: "High latency"
      description: "95th percentile > 1s"

  - alert: LowPredictionVolume
    expr: rate(model_predictions_total[1h]) < 5
    for: 10m
    labels: { severity: info }
    annotations:
      summary: "Low prediction volume"
      description: "Less than 5 predictions/hour"
    
  

⚡ Apply Alert Rules

Apply Alert Rules
    
      # Apply alert rules
kubectl apply -f alerts.yaml -n monitoring

# Prometheus will automatically load them
    
  
🔄

Step 6: Advanced Data Drift & Accuracy Tracking

Implement advanced monitoring for data drift detection and automated retraining triggers.

🤖 Model Monitor Component

Use a separate "model monitor" component to periodically:

  • â€ĸ Compare real vs predicted values
  • â€ĸ Log MAE / RMSE to Prometheus (MODEL_ACCURACY.set(value))
  • â€ĸ Trigger retraining via Kubeflow Pipeline when threshold crosses

Implementation: This can be done using a ScheduledWorkflow in Kubeflow that runs periodically to check model performance and trigger retraining pipelines when drift is detected.

💡 Monitoring Best Practices

Follow these guidelines to build effective ML model monitoring systems.

Getting Started

  • ✅ Start simple (latency, error rate, accuracy)
  • ✅ Set SLOs (95th percentile < 1s latency, < 5% errors)
  • ✅ Enable data validation & drift checks

Advanced Features

  • ✅ Integrate alerts with Slack or email
  • ✅ Automate model retraining when drift detected
  • ✅ Monitor business metrics and user feedback

🎉 Monitoring Setup Complete!

After this phase, you'll have real-time Grafana dashboards, Prometheus alerts for anomalies, and complete visibility into accuracy, drift, and system health.

📊

Real-time Dashboards

Grafana panels for all key metrics

🚨

Smart Alerts

Prometheus alerts for anomalies

đŸ‘ī¸

Complete Visibility

Accuracy, drift, and system health