Model Monitoring & Observability
Monitor your ML models in production for performance, accuracy, and health with Prometheus + Grafana
đ¯ Objective
Set up end-to-end monitoring for your deployed ML model to track performance, drift, and system health.
Performance
Latency & accuracy tracking
Drift Detection
Data & concept drift monitoring
System Health
Resource usage & alerts
đ Why Monitor ML Models?
Unlike traditional code, ML models degrade silently when real-world data drifts. Monitoring catches it early â before customers do.
đ§Š Data Drift
Input data distribution changes
Example:
House prices rise after 2020 â old model underpredicts
đ Concept Drift
Feature â output relationship shifts
Example:
People prefer home offices after COVID â new weight for "rooms"
⥠Performance Drop
Accuracy/latency degrades
Example:
Model accuracy falls from 95 â 80% in 6 months
đ What to Monitor
Comprehensive monitoring covers model performance, data quality, system health, and business metrics.
đ¯ Model Performance
- âĸ Accuracy / RMSE / MAE
- âĸ Prediction distribution
- âĸ Response latency
đ Data Quality
- âĸ Missing values or schema mismatch
- âĸ Value ranges outside training bounds
- âĸ Data drift detection
âī¸ System Health
- âĸ CPU, memory, and pod restarts
- âĸ Request throughput & error rate
- âĸ Resource utilization
đ° Business Metrics
- âĸ Prediction volume per day
- âĸ User satisfaction / feedback loop
- âĸ Cost per prediction
Step 1: Install Prometheus + Grafana
Use Helm to set up a complete monitoring stack with Prometheus, Grafana, and ServiceMonitors.
đĻ Install Monitoring Stack
# Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install monitoring stack
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace --set grafana.adminPassword=admin123
# Verify pods
kubectl get pods -n monitoring
This deploys:
- âĸ Prometheus (Server + AlertManager + Node Exporter)
- âĸ Grafana (for dashboards)
- âĸ ServiceMonitors for KServe pods
Step 2: Add Custom Metrics to Your Model
Update your model server to emit Prometheus metrics for comprehensive monitoring.
đ Monitored Model Server
Update your model_server.py with Prometheus metrics:
import joblib, numpy as np, time
from typing import Dict, Any
from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Custom metrics
PREDICTION_COUNTER = Counter('model_predictions_total', 'Total predictions')
PREDICTION_ERRORS = Counter('model_prediction_errors_total', 'Total errors')
PREDICTION_LATENCY = Histogram('model_prediction_latency_seconds', 'Prediction latency')
MODEL_ACCURACY = Gauge('model_accuracy', 'Model accuracy')
class MonitoredHousePriceModel:
def __init__(self):
self.model = joblib.load('/mnt/models/model.pkl')
start_http_server(8000) # Expose metrics endpoint
MODEL_ACCURACY.set(0.93) # Current model accuracy baseline
def predict(self, request: Dict[str, Any]) -> Dict[str, Any]:
start = time.time()
try:
size = request['size']; bedrooms = request['bedrooms']; age = request['age']
X = np.array([[size, bedrooms, age]])
pred = float(self.model.predict(X)[0])
PREDICTION_COUNTER.inc()
PREDICTION_LATENCY.observe(time.time() - start)
return {'predicted_price': pred}
except Exception:
PREDICTION_ERRORS.inc()
raise
model = MonitoredHousePriceModel()
After updating your code:
- âĸ Re-build your Docker image
- âĸ Re-deploy your InferenceService
- âĸ Prometheus will auto-scrape metrics on port 8000
Step 3: Access Grafana
Set up port-forwarding and access the Grafana dashboard to create monitoring panels.
đ Access Grafana Dashboard
# Port-forward to Grafana
kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80
# Then open:
# đ http://localhost:3000
# Login credentials:
# Username: admin
# Password: admin123
Step 4: Create Dashboard Panels
Add monitoring panels to track key metrics and visualize model performance.
đ Dashboard Queries
| Metric | PromQL | Panel Type |
|---|---|---|
| Prediction Rate | rate(model_predictions_total[1m]) | Time series |
| Avg Latency | histogram_quantile(0.95, rate(model_prediction_latency_seconds_bucket[5m])) | Gauge |
| Error Rate | rate(model_prediction_errors_total[5m]) / rate(model_predictions_total[5m]) | Bar graph |
| Model Accuracy | model_accuracy | Single stat |
| CPU/Memory | container_cpu_usage_seconds_total / container_memory_usage_bytes | Resource panel |
Step 5: Add Alert Rules
Configure Prometheus alerts to notify you when model performance degrades or issues occur.
đ Alert Rules Configuration
Save this as alerts.yaml:
groups:
- name: model-monitoring
rules:
- alert: HighErrorRate
expr: rate(model_prediction_errors_total[5m]) > 0.1
for: 2m
labels: { severity: warning }
annotations:
summary: "High error rate detected"
description: "Model error rate = {{ $value }} errors/sec"
- alert: HighLatency
expr: histogram_quantile(0.95, rate(model_prediction_latency_seconds_bucket[5m])) > 1
for: 2m
labels: { severity: warning }
annotations:
summary: "High latency"
description: "95th percentile > 1s"
- alert: LowPredictionVolume
expr: rate(model_predictions_total[1h]) < 5
for: 10m
labels: { severity: info }
annotations:
summary: "Low prediction volume"
description: "Less than 5 predictions/hour"
⥠Apply Alert Rules
# Apply alert rules
kubectl apply -f alerts.yaml -n monitoring
# Prometheus will automatically load them
Step 6: Advanced Data Drift & Accuracy Tracking
Implement advanced monitoring for data drift detection and automated retraining triggers.
đ¤ Model Monitor Component
Use a separate "model monitor" component to periodically:
- âĸ Compare real vs predicted values
- âĸ Log MAE / RMSE to Prometheus (MODEL_ACCURACY.set(value))
- âĸ Trigger retraining via Kubeflow Pipeline when threshold crosses
Implementation: This can be done using a ScheduledWorkflow in Kubeflow that runs periodically to check model performance and trigger retraining pipelines when drift is detected.
đĄ Monitoring Best Practices
Follow these guidelines to build effective ML model monitoring systems.
Getting Started
- â Start simple (latency, error rate, accuracy)
- â Set SLOs (95th percentile < 1s latency, < 5% errors)
- â Enable data validation & drift checks
Advanced Features
- â Integrate alerts with Slack or email
- â Automate model retraining when drift detected
- â Monitor business metrics and user feedback
đ Monitoring Setup Complete!
After this phase, you'll have real-time Grafana dashboards, Prometheus alerts for anomalies, and complete visibility into accuracy, drift, and system health.
Real-time Dashboards
Grafana panels for all key metrics
Smart Alerts
Prometheus alerts for anomalies
Complete Visibility
Accuracy, drift, and system health