Ensemble streamflow forecasting

Diverse-loss LSTM ensembles for multi-horizon prediction and uncertainty quantification

Motivation

LSTM networks have become the state of the art for rainfall–runoff simulation, but watershed systems are uncertain and hydrologic regimes vary — low flows, peak events, and the day-to-day mid-range all matter for water-supply and flood-risk decisions, and no single loss function captures all of them well. A model trained to minimize MSE is optimized for the mean response; a model trained to maximize NSE rewards peak fit; quantile losses tune the tails. The premise of this work is that diversity in the objective yields complementary models that, combined, forecast better than any one of them alone.

Approach

Diverse-loss LSTM ensemble

Eight LSTM models are trained on the same catchment data, each minimizing a different objective function chosen to emphasize a different part of the hydrograph:

Mean Squared Error (MSE) — mean response.
Nash–Sutcliffe Efficiency (NSE) — relative skill vs. mean baseline.
Kling–Gupta Efficiency (KGE) — correlation, bias, and variability jointly.
Huber loss — outlier-robust mid-range fit.
Quantile loss at 0.15 and 0.85 — conditional 15th and 85th percentiles (low- and high-flow tails).
Expectile loss at 0.15 and 0.85 — expectation-based analogues of the quantile tails.

Their outputs are combined with a Linear Stacking Ensemble to produce both a point forecast and an uncertainty envelope.

Data

CAMELS rainfall–runoff dataset accessed through the Caravan framework.
482 catchments across the United States span a wide range of hydroclimatic regimes.
Forecasting horizons evaluated from 1 day to 30 days ahead.

Dahal, Gupta, Bokati, Kumar — Applied Soft Computing, 2026, Vol. 198, Art. 115276

Key results

Six US maps in two columns and three rows showing ensemble NSE performance (left column) and NSE range across ensemble members (right column) for 1-day, 7-day, and 30-day forecast horizons. — Ensemble forecast skill (left, NSE) and model uncertainty (right, NSE range across the 8 loss-function models) across the 482 CAMELS catchments, for 1-day, 7-day, and 30-day forecast horizons. Skill is high at 1 day and degrades as horizon grows; uncertainty — the spread among loss-specialist members — varies regionally and is itself useful information for decision support.

The ensemble outperforms every individual loss-function model at all horizons (1 to 30 days).
Statistically significant improvement in Nash–Sutcliffe Efficiency (p < 0.001) against the best individual model at every horizon.
Uncertainty coverage on the held-out test set:
- 93.4% at 1-day forecast
- 83.3% at 7-day forecast
- 81.7% at 30-day forecast
The framework is scalable and operationally feasible, with direct applications in data-scarce regions and real-time operational hydrology.

Visualization

A dynamic streamflow prediction tool for selected Arizona gauges is currently under development. It will surface multi-horizon (1-, 7-, and 30-day) probabilistic forecasts with calibrated uncertainty intervals from the ensemble described above. A live link will appear here once the tool is released.

Generated outputs

Trained ensemble of 8 LSTMs across 482 CAMELS catchments.
Multi-horizon (1–30 day) probabilistic streamflow forecasts with calibrated uncertainty intervals.
A reusable training and stacking recipe portable to other catchment networks.

Funding

Advanced Water Observatory and Decision Support System (AWODSS).
Arizona Water Innovation Initiative (AWII) — azwaterinnovation.asu.edu. A multi-year partnership with the State of Arizona led by ASU’s Julie Ann Wrigley Global Futures Laboratory.

Status

Methodology paper published. Dynamic prediction tool for Arizona streams is under construction — an operational interface for selected gauges will be linked here once released.