Time series forecasting is experiencing a revolution. For decades, classical statistical methods — ARIMA, Exponential Smoothing, Prophet — were the unquestioned standard. Now, foundation models trained on millions of time series promise to forecast anything without domain-specific tuning. But do they deliver?
The honest answer: it depends. Here's when each approach wins.
Classical Methods: The Proven Workhorses
ARIMA and Variants
AutoRegressive Integrated Moving Average models capture linear trends and seasonality through differencing, autoregression, and moving averages. SARIMA adds explicit seasonal components.
- Strengths: Well-understood statistical properties, interpretable parameters, works well with limited data, fast to train
- Weaknesses: Assumes linearity, struggles with multiple seasonalities, requires stationarity, doesn't handle external variables elegantly
- Best for: Single time series with clear trend and seasonality, statistical rigor required (confidence intervals)
Prophet
Meta's decomposable time series model handles trends, multiple seasonalities, and holiday effects. Designed for business forecasting by analysts rather than statisticians.
- Strengths: Handles missing data and outliers, multiple seasonalities, easy to add holiday effects and changepoints
- Weaknesses: Can underperform on short series, limited ability to capture complex non-linear patterns
- Best for: Business metrics with weekly/yearly seasonality, analyst-friendly forecasting
Gradient Boosting (LightGBM, XGBoost)
Not traditional time series models, but gradient boosting with engineered time features (lag values, rolling statistics, calendar features) is a pragmatic and surprisingly effective approach.
- Strengths: Handles non-linear relationships, naturally incorporates external features, fast training, robust to noise
- Weaknesses: Requires manual feature engineering, no native uncertainty estimation, can overfit on short series
- Best for: Demand forecasting with many external factors, tabular time series problems
Foundation Models: The New Contenders
TimeGPT, Chronos, Lag-Llama
These models are pre-trained on millions of diverse time series and can forecast new series zero-shot — without domain-specific training.
- Strengths: Zero-shot forecasting (no training data needed), capture complex patterns across domains, handle multiple series simultaneously
- Weaknesses: Less interpretable, require GPU for inference, newer with less production track record, can underperform on domain-specific patterns
- Best for: Cold-start forecasting (new products, new markets), cross-domain transfer, rapid prototyping
Head-to-Head Comparison
Data Requirements
Classical methods need at least 2-3 full seasonal cycles of historical data for reliable forecasting. Foundation models can produce reasonable forecasts with minimal history because they transfer knowledge from pre-training.
Accuracy
On standard benchmarks, the picture is nuanced. Foundation models generally outperform classical methods on diverse, heterogeneous datasets. But on specific domains with domain-tuned classical models, the gap narrows or reverses. A well-engineered LightGBM pipeline with domain-specific features often beats a generic foundation model.
Interpretability
Classical methods win decisively. ARIMA parameters have statistical meaning. Prophet decomposes forecasts into trend, seasonality, and holiday components. Foundation models are essentially black boxes.
Scalability
When you need to forecast thousands of time series (product demand across SKUs, energy consumption across meters), foundation models and gradient boosting scale better than fitting individual ARIMA models.
Our Recommendation
- Start with classical methods when you have sufficient historical data, need interpretable results, and the forecasting problem is well-defined.
- Use foundation models for cold-start problems, rapid prototyping, and as a baseline to beat with domain-specific approaches.
- Combine both: Use foundation model forecasts as additional features in a gradient boosting pipeline. This ensemble approach often outperforms either method alone.
The most effective forecasting systems we've built use multiple methods and let the data decide which performs best for each specific series.
The Bottom Line
Foundation models haven't made classical methods obsolete. They've added a powerful new tool to the forecasting toolkit. The winning strategy is pragmatic: understand the strengths of each approach, prototype quickly, and let rigorous backtesting determine what works for your specific problem.
