Every ML model in production has an expiration date. The world changes — customer behavior shifts, market conditions evolve, new products launch, regulations update — and the patterns your model learned from historical data become stale. This phenomenon is called data drift, and it's the silent killer of production ML systems.
What Is Data Drift?
Data drift occurs when the statistical properties of the data your model encounters in production differ from the data it was trained on. There are several types:
- Feature drift: The distribution of input features changes. Customer demographics shift, sensor readings calibrate differently, or text inputs use new vocabulary.
- Label drift: The distribution of target variables changes. Fraud patterns evolve, product preferences shift, or disease prevalence changes.
- Concept drift: The relationship between features and the target changes. The same customer profile that predicted high spending now predicts low spending because the economy shifted.
Concept drift is the most dangerous because the model can look healthy by input metrics while silently making wrong predictions.
Detection Techniques
Statistical Tests
Compare the distribution of incoming data against the training data distribution:
- Kolmogorov-Smirnov test: Detects changes in continuous feature distributions
- Chi-squared test: Detects changes in categorical feature distributions
- Population Stability Index (PSI): Quantifies shift magnitude; commonly used in financial modeling
- Jensen-Shannon divergence: Measures difference between two probability distributions
Prediction Monitoring
Track the distribution of model outputs over time. If your classifier suddenly starts predicting 80% positive when the historical rate is 20%, something has changed — even if you can't pinpoint the cause.
Performance Monitoring
When ground truth labels are available (even with a delay), track actual model performance. Declining accuracy, precision, or recall is the most direct signal of drift. The challenge is that labels are often delayed by days, weeks, or months.
Window-Based Comparison
Compare recent data (last 7 days) against a reference window (training data or a stable historical period). Alert when the divergence exceeds a threshold. Use sliding windows to distinguish between gradual drift and sudden shifts.
Building a Drift Monitoring System
A production drift monitoring system needs four components:
- Data collection: Log all model inputs and outputs with timestamps. Store in a format that supports efficient aggregation and comparison.
- Reference profiles: Maintain statistical profiles of the training data: means, standard deviations, distribution histograms, correlations between features.
- Automated comparison: Run drift tests on a schedule (hourly, daily, or per-batch). Compare incoming data against reference profiles.
- Alerting and response: Define thresholds for each feature and overall drift score. Alert the team when thresholds are exceeded.
Responding to Drift
Detecting drift is only half the battle. You also need a response playbook:
- Investigate first: Not all drift requires action. Seasonal patterns, known events (holidays, promotions), and data quality issues can trigger false alarms.
- Retrain with recent data: The most common response. Add recent data to the training set and retrain the model. This works for gradual drift.
- Feature engineering: If the drift is caused by a missing signal, adding new features may be more effective than retraining.
- Model replacement: For fundamental concept drift, the current model architecture may no longer be appropriate. Consider reframing the problem.
- Fallback to rules: When drift is severe and retraining isn't immediately possible, fall back to simple business rules that are robust to distribution changes.
Tools for Drift Detection
- Evidently AI: Open-source monitoring with pre-built drift reports and dashboards
- WhyLabs: Managed monitoring platform with statistical profiling
- Databricks Lakehouse Monitoring: Integrated drift detection for models served on Databricks
- Custom dashboards: Prometheus + Grafana with custom drift metrics
The Bottom Line
Models don't age gracefully. They degrade silently, making increasingly wrong predictions while reporting healthy technical metrics. The difference between teams that maintain reliable ML systems and those that don't usually comes down to one thing: systematic drift monitoring. Build it before you need it.
