📊

Model Monitoring & Observability

Monitor your ML models in production for performance, accuracy, and health with Prometheus + Grafana

⏱️ 10 minutes 📚 Phase 5 of 7 🛠️ Hands-on

🎯 Objective

Set up end-to-end monitoring for your deployed ML model to track performance, drift, and system health.

⚡

Performance

Latency & accuracy tracking

🔄

Drift Detection

Data & concept drift monitoring

💚

System Health

Resource usage & alerts

🔍 Why Monitor ML Models?

Unlike traditional code, ML models degrade silently when real-world data drifts. Monitoring catches it early — before customers do.

🧩 Data Drift

Input data distribution changes

Example:

House prices rise after 2020 → old model underpredicts

🔄 Concept Drift

Feature → output relationship shifts

Example:

People prefer home offices after COVID → new weight for "rooms"

⚡ Performance Drop

Accuracy/latency degrades

Example:

Model accuracy falls from 95 → 80% in 6 months

📈 What to Monitor

Comprehensive monitoring covers model performance, data quality, system health, and business metrics.

🎯 Model Performance

• Accuracy / RMSE / MAE
• Prediction distribution
• Response latency

📊 Data Quality

• Missing values or schema mismatch
• Value ranges outside training bounds
• Data drift detection

⚙️ System Health

• CPU, memory, and pod restarts
• Request throughput & error rate
• Resource utilization

💰 Business Metrics

• Prediction volume per day
• User satisfaction / feedback loop
• Cost per prediction

🛠️

Step 1: Install Prometheus + Grafana

Use Helm to set up a complete monitoring stack with Prometheus, Grafana, and ServiceMonitors.

📦 Install Monitoring Stack

Install Prometheus + Grafana

    
      # Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install monitoring stack
helm install prometheus prometheus-community/kube-prometheus-stack   --namespace monitoring   --create-namespace   --set grafana.adminPassword=admin123

# Verify pods
kubectl get pods -n monitoring

This deploys:

• Prometheus (Server + AlertManager + Node Exporter)
• Grafana (for dashboards)
• ServiceMonitors for KServe pods

📊

Step 2: Add Custom Metrics to Your Model

Update your model server to emit Prometheus metrics for comprehensive monitoring.

🐍 Monitored Model Server

Update your model_server.py with Prometheus metrics:

monitored_model_server.py

    
      import joblib, numpy as np, time
from typing import Dict, Any
from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Custom metrics
PREDICTION_COUNTER = Counter('model_predictions_total', 'Total predictions')
PREDICTION_ERRORS  = Counter('model_prediction_errors_total', 'Total errors')
PREDICTION_LATENCY = Histogram('model_prediction_latency_seconds', 'Prediction latency')
MODEL_ACCURACY     = Gauge('model_accuracy', 'Model accuracy')

class MonitoredHousePriceModel:
    def __init__(self):
        self.model = joblib.load('/mnt/models/model.pkl')
        start_http_server(8000)  # Expose metrics endpoint
        MODEL_ACCURACY.set(0.93) # Current model accuracy baseline

    def predict(self, request: Dict[str, Any]) -> Dict[str, Any]:
        start = time.time()
        try:
            size = request['size']; bedrooms = request['bedrooms']; age = request['age']
            X = np.array([[size, bedrooms, age]])
            pred = float(self.model.predict(X)[0])
            PREDICTION_COUNTER.inc()
            PREDICTION_LATENCY.observe(time.time() - start)
            return {'predicted_price': pred}
        except Exception:
            PREDICTION_ERRORS.inc()
            raise

model = MonitoredHousePriceModel()

ℹ️ Next Steps

After updating your code:

• Re-build your Docker image
• Re-deploy your InferenceService
• Prometheus will auto-scrape metrics on port 8000

📈

Step 3: Access Grafana

Set up port-forwarding and access the Grafana dashboard to create monitoring panels.

🔗 Access Grafana Dashboard

Port-forward to Grafana

    
      # Port-forward to Grafana
kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80

# Then open:
# 👉 http://localhost:3000

# Login credentials:
# Username: admin
# Password: admin123

📊

Step 4: Create Dashboard Panels

Add monitoring panels to track key metrics and visualize model performance.

📈 Dashboard Queries

Metric	PromQL	Panel Type
Prediction Rate	rate(model_predictions_total[1m])	Time series
Avg Latency	histogram_quantile(0.95, rate(model_prediction_latency_seconds_bucket[5m]))	Gauge
Error Rate	rate(model_prediction_errors_total[5m]) / rate(model_predictions_total[5m])	Bar graph
Model Accuracy	model_accuracy	Single stat
CPU/Memory	container_cpu_usage_seconds_total / container_memory_usage_bytes	Resource panel

🚨

Step 5: Add Alert Rules

Configure Prometheus alerts to notify you when model performance degrades or issues occur.

📄 Alert Rules Configuration

Save this as alerts.yaml:

alerts.yaml

    
      groups:
- name: model-monitoring
  rules:
  - alert: HighErrorRate
    expr: rate(model_prediction_errors_total[5m]) > 0.1
    for: 2m
    labels: { severity: warning }
    annotations:
      summary: "High error rate detected"
      description: "Model error rate = {{ $value }} errors/sec"

  - alert: HighLatency
    expr: histogram_quantile(0.95, rate(model_prediction_latency_seconds_bucket[5m])) > 1
    for: 2m
    labels: { severity: warning }
    annotations:
      summary: "High latency"
      description: "95th percentile > 1s"

  - alert: LowPredictionVolume
    expr: rate(model_predictions_total[1h]) < 5
    for: 10m
    labels: { severity: info }
    annotations:
      summary: "Low prediction volume"
      description: "Less than 5 predictions/hour"

⚡ Apply Alert Rules

Apply Alert Rules

    
      # Apply alert rules
kubectl apply -f alerts.yaml -n monitoring

# Prometheus will automatically load them

🔄

Step 6: Advanced Data Drift & Accuracy Tracking

Implement advanced monitoring for data drift detection and automated retraining triggers.

🤖 Model Monitor Component

Use a separate "model monitor" component to periodically:

• Compare real vs predicted values
• Log MAE / RMSE to Prometheus (MODEL_ACCURACY.set(value))
• Trigger retraining via Kubeflow Pipeline when threshold crosses

Implementation: This can be done using a ScheduledWorkflow in Kubeflow that runs periodically to check model performance and trigger retraining pipelines when drift is detected.

💡 Monitoring Best Practices

Follow these guidelines to build effective ML model monitoring systems.

Getting Started

✅ Start simple (latency, error rate, accuracy)
✅ Set SLOs (95th percentile < 1s latency, < 5% errors)
✅ Enable data validation & drift checks

Advanced Features

✅ Integrate alerts with Slack or email
✅ Automate model retraining when drift detected
✅ Monitor business metrics and user feedback

🎉 Monitoring Setup Complete!

After this phase, you'll have real-time Grafana dashboards, Prometheus alerts for anomalies, and complete visibility into accuracy, drift, and system health.

📊

Real-time Dashboards

Grafana panels for all key metrics

🚨

Smart Alerts

Prometheus alerts for anomalies

👁️

Complete Visibility

Accuracy, drift, and system health

CI/CD for ML → Back to Dashboard