Probabilistic forecasting with GBRT

The class reservoir_computing.RC_forecaster allows to quickly perform forecasting by fitting a linear model that maps the reservoir states into the predictions. The linear is implemented as the ridge regressor from sklearn sklearn.linear_model.Ridge.

It is however possible to use other regression models from sklearn, including those that computes confidence intervals obtaining, in this way, a probabilistic forecasting.

In this example we will use sklearn.ensemble.HistGradientBoostingRegressor, a Gradient Boost Regression Tree (GBRT) that allows to compute different quantiles.

Let’s start by importing the necessary libraries.

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.decomposition import PCA

from reservoir_computing.reservoir import Reservoir
from reservoir_computing.utils import make_forecasting_dataset
from reservoir_computing.datasets import PredLoader

np.random.seed(0) # For reproducibility

Load the data

We will use the dataloader PredLoader to get a forecasting datatset. To see what datatsets are available, we can call the function available_datasets. By setting details=True we can get additional information.

downloader = PredLoader()
downloader.available_datasets(details=True)  # Describe available datasets

Available datasets:

ElecRome
-----------
Univariate time series forecasting.
Length: 137376
Features: 1

CDR
-----------
Multivariate time series forecasting.
Length: 3336
Features: 8

For this example, we will use ElecRome that is the electricity consumption coming from a backbone of the energy supply network in the city of Rome.
The original data is a time series sampled every 10 minutes.
If we are not interested in such an high resolution, we can resample the data to hourly resolution by creating a new time series whose entries are the means of 6 consecutive time steps in the original series.
We also take only the first 3000 time steps to be faster in fitting the model.

# Download data
ts_full = downloader.get_data("ElecRome")

# Resample the time series to hourly frequency
ts_hourly = np.mean(ts_full.reshape(-1, 6), axis=1)[:, None]
print("Resampled: ", ts_hourly.shape)

# Use only the first 3000 time steps
ts_small = ts_hourly[0:3000,:]
print("Resampled small: ", ts_small.shape)

Loaded ElecRome dataset.
Data shape:
  X: (137376, 1)
Resampled:  (22896, 1)
Resampled small:  (3000, 1)

Prepare the datasets

To train our regression model we need input and target data Xtr and Ytr. We also need test data Xte and Yte to test our model and validation data Xval and Yval if we need to do hyperparameters tuning. We will use the function make_forecasting_dataset that given a time series X does the following computations:

Splits the dataset in consecutive chunks: train, val and test. The size of the chunks is given by the values val_percent and test_percent. If we do not need validation data, set val_percent=0 (default) and the validation data will not be created.
Create input data X and target data Y by shifting the data horizon time steps, where horizon is how far we want to predict. For example:
- Xtr = train[:-horizon,:]
- Ytr = train[horizon:,:]
Normalizes the data using a scaler object from sklearn.preprocessing. If no scalers are passed, a StandardScaler is created. The scaler is fit on Xtr and then used to transform Ytr, Xval, and Xte. Note that Yval and Yte are not transformed.

The code below exemplifies the use of the function.

X = np.arange(36)[:, None]

Xtr, Ytr, Xte, Yte, Xval, Yval, scaler = make_forecasting_dataset(X, horizon=5,
                                                                  test_percent=0.2,
                                                                  val_percent=0.3)
print("Xtr: ", scaler.inverse_transform(Xtr.T)[0])
print("Ytr: ", scaler.inverse_transform(Ytr.T))
print("Xval: ", scaler.inverse_transform(Xval.T)[0])
print("Yval: ", Yval.T)
print("Xte: ", scaler.inverse_transform(Xte.T)[0])
print("Yte: ", Yte.T)

Xtr:  [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]
Ytr:  [[ 5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16.]]
Xval:  [17. 18. 19. 20. 21. 22.]
Yval:  [[22 23 24 25 26 27]]
Xte:  [28. 29. 30.]
Yte:  [[33 34 35]]

For this example, we want to make forecast 24h ahead.
Also, we are not opitimizing the hyperparameters of the model so we do not need a valiation set (we leave the default val_percent=0 and the validation data will not be returned).

# Generate training and test datasets
Xtr, Ytr, Xte, Yte, scaler = make_forecasting_dataset(ts_small,
                                                      horizon=24, # forecast horizon of 24h ahead
                                                      test_percent = 0.1)

Define the Reservoir

Next, we create a Reservoir by specifying (rather arbitrarily) the hyperparameters.
Then, we compute the sequence of the Reservoir states states_tr and states_te associated with the training and test data, respecitvely.

res = Reservoir(n_internal_units=1000, 
                spectral_radius=0.95, 
                leak=None, 
                connectivity=0.25, 
                input_scaling=0.1, 
                noise_level=0.0, 
                circle=False)   

n_drop=10
states_tr = res.get_states(Xtr[None,:,:], n_drop=n_drop, bidir=False)
states_te = res.get_states(Xte[None,:,:], n_drop=n_drop, bidir=False)

Dimensionalty reduction (optional)

As optional step, we can reduce the size of the Reservoir states, which in this example is quite large.
In particular, since n_internal_units=1000, we end up with a sequence of length T of vectors with size 1000.
Dimensionality reduction can speed up the training, especially if we use a sophisticated model as the readout, and can also provide some regularization that improves the prediction performances.
Below, we Reduce the dimension of the reservoir states from 1000 to 75.

pca = PCA(n_components=75)
states_tr = pca.fit_transform(states_tr[0])
states_te = pca.transform(states_te[0])

Fit the readout (linear)

We are now ready to train the readout to predict the desired output given the sequence of (reduced) Reservoir states.
We start by using a linear readout implemented by a Ridge regressor.
After the readout is trained, we use it to compute the predictions \(\hat{Y}_\text{te}\) of the test data.

# Fit the ridge regression model
ridge = Ridge(alpha=1.0) 
ridge.fit(states_tr, Ytr[n_drop:,:])

# Compute the predictions
Yhat = ridge.predict(states_te)

Finally, we plot the results.

fig = plt.figure(figsize=(14,4))
plt.plot(Yte[n_drop:,:], 'k--', label="True", linewidth=2)
plt.plot(scaler.inverse_transform(Yhat), label="Predicted")
plt.grid()
plt.legend()
plt.title("True vs predicted electricity load")
plt.show()

../_images/bd855cc93ea90ccbb8675052325dcc4f86e71d4ae857f1bfe313db4284fb2d47.png

Fit the readout (GBRT)

Mapping the Reservoir states to the desired output is a standard regression problem, which can be solved by one of the many standard regression models in sklearn.
For example, we can use a Gradient Boost Regression Tree, which gives us predictions for different quantiles.
In this way, we can compute confidence intervals in our predictions.
This is a very simple way to implement probabilistic forecasting
In the following, we will fit a different model for the 0.5, 0.05 and 0.95 quantiles, which will give us a 90% confidence interval for our prediction.

# Quantile 0.5
max_iter = 100
gbrt_median = HistGradientBoostingRegressor(
    loss="quantile", quantile=0.5, max_iter=max_iter)
gbrt_median.fit(states_tr, Ytr[n_drop:,0])
median_predictions = gbrt_median.predict(states_te)

# Quantile 0.05
gbrt_percentile_5 = HistGradientBoostingRegressor(
    loss="quantile", quantile=0.05, max_iter=max_iter)
gbrt_percentile_5.fit(states_tr, Ytr[n_drop:,0])
percentile_5_predictions = gbrt_percentile_5.predict(states_te)

# Quantile 0.95
gbrt_percentile_95 = HistGradientBoostingRegressor(
    loss="quantile", quantile=0.95, max_iter=max_iter)
gbrt_percentile_95.fit(states_tr, Ytr[n_drop:,0])
percentile_95_predictions = gbrt_percentile_95.predict(states_te)

Plot the results with the confidence 90% confidence intervals.

# Plot the results
fig = plt.figure(figsize=(14,4))
plt.plot(Yte[n_drop:,:], 'k--', label="True", linewidth=2)
plt.plot(scaler.inverse_transform(median_predictions[:,None]), label="Median prediction", color="tab:blue")
plt.fill_between(np.arange(len(Yte[n_drop:,:])), scaler.inverse_transform(percentile_5_predictions[:,None]).ravel(), scaler.inverse_transform(percentile_95_predictions[:,None]).ravel(), alpha=0.3, label="90% CI", color="tab:blue")
plt.grid()
plt.legend()
plt.title("Predicted electricity load using Gradient Boosting Regression Trees")
plt.show()

../_images/d0c0c128f05a4c298f5675c5f211cc32f0e77b81bfe8745a629447fde66f7b4c.png