
Forecasting Norwegian Energy Demand with Python
From time to time I like to experiment with new packages, tools and datasets (as is probably already evident from my website). Most recently I have been playing around with using Skforecast which is a python library and provides some really interesting machine learning forecasting opportunities. What I do in this article is trial out this machine learning library in python applied to Norwegian energy data. Here what I mean in particular is electricity consumption in MWh for Norway from the 1st of January 2018 until the 31st of December 2021.
The intention here is more to play around with a dataset I’ve not used before, through some libraries in Python which I’ve not used before. It’s much more for general interest than anything else, so let’s dive in.
Step 1
Date preparation and cleaning
I start by reading in the required libraries, and reading in the data I’ve prepared which includes some interesting aspects we will also explore through time series analysis. Following on from this we will launch into some forecasting which consists of predicting the future value of a time series, typically through modelling the series solely based on its past behaviour (autoregressive) or some other approaches which use other, typically external variables.
# Data manipulation
# ==============================================================================
import numpy as np
import pandas as pd
# Libraries to construct Plots
# ==============================================================================
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
plt.style.use('fivethirtyeight')
import scipy
# Modelling and Forecasting
# ==============================================================================
from sklearn.linear_model import Ridge
from lightgbm import LGBMRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregMultiOutput import ForecasterAutoregMultiOutput
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from sklearn.linear_model import LinearRegression
from skforecast.model_selection import backtesting_forecaster
Next we read in our data, for this I used hourly consumption data from Statnett. I included some other variables which is based on the work done here where the authors focused on data in the Australian state of Victoria.
#Reading in Data
# ==============================================================================
path = "norgeenergy1.csv"
data = pd.read_csv(path, sep=',')
data.info()
Next, as we are dealing with time series data we have to ensure the date, particularly the time column in the image above is configured correctly. We must also ensure that in order to test the models we develop at later stages that we divide the data into 3 sets, training, validation, and test. This is done in order to allow us to optimize the hyperparameters of the model and evaluate its predictive capability.
#Ensuring date and time is configured correctly, and creating training, validation
#and test data.
# ==============================================================================
data = data.loc['01.01.2018 00:00:00': '31.12.2021 23:00:00']
end_train = '2019-12-31 23:00:00'
end_validation = '2021-11-30 23:00:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
As with most (but not all, of course) things in data science, this can be better expressed visually, so in order to construct a simple plot showing which time periods are broken in the respective sets of training, validation and test, we can run the below code.
Step 2
Date exploration
#Visualising Training, Validation and Test data
# ==============================================================================
fig, ax = plt.subplots(figsize=(12, 4))
data_train.Demand.plot(ax=ax, label='train', linewidth=1)
data_val.Demand.plot(ax=ax, label='validation', linewidth=1)
data_test.Demand.plot(ax=ax, label='test', linewidth=1)
ax.set_title('Electricity demand')
ax.legend();
plt.show()

Perhaps, like me, you are interested in zooming in a little to see what is actually happening in a particular slice of the above visualisation. For me, this period was the end of 2020, the beginning of 2021 period. So let’s zoom in a little on this and see the pattern more clearly.
#Visualising Training, Validation and Test data
# ==============================================================================
zoom = ('2020-12-01 14:00:00','2021-02-01 14:00:00') #Our zoom period is from
#December 1st 2020 - February 1st 2021
fig = plt.figure(figsize=(12, 6))
grid = plt.GridSpec(nrows=8, ncols=1, hspace=0.6, wspace=0)
main_ax = fig.add_subplot(grid[1:3, :])
zoom_ax = fig.add_subplot(grid[5:, :])
data.Demand.plot(ax=main_ax, c='black', alpha=0.5, linewidth=0.5)
min_y = min(data.Demand)
max_y = max(data.Demand)
main_ax.fill_between(zoom, min_y, max_y, facecolor='blue', alpha=0.5, zorder=0)
main_ax.set_xlabel('')
data.loc[zoom[0]: zoom[1]].Demand.plot(ax=zoom_ax, color='blue', linewidth=2)
main_ax.set_title(f'Electricity demand: {data.index.min()}, {data.index.max()}', fontsize=14)
zoom_ax.set_title(f'Electricity demand: {zoom}', fontsize=14)
plt.subplots_adjust(hspace=1)
plt.show()

more than just a pretty visualisation, or satiating our appetites for interesting tidbits, this visualisation allows us to see a number of interesting aspects which we will explore later. For example, we can see that there is a clear weekly seasonality and that in general patterns seem consistent across time, allowing us to begin to unpack this temporal dimension of electricity consumption in Norway.
In the next step we can explore this seasonality across three different dimensions of time, namely Monthly, weekly and daily seasonality.

An interesting aspect here with this visualisation is just how different it looks from the data from the Australian state of Victoria contained in the visualisation contained here. Perhaps it’s much more than just hemispheric differences, as the Australian data seems much more consistent.

Electricity demand by day of the Week looks a little more typical. Day 1 is Monday.

Finally, demand by time of day also seems to track with much of what we have seen before, although evidently less of an afternoon spike as is often observed. However it does track with what we would expect to see.
Moving on from the seasonality and data exploration, we now turn to forecasting.
Step 3
Forecasting in Python
What we do now is create a recursive autoregressive model which we train from a linear regression model with a Ridge penalty and a time window of 24 lags. This means that for each predictions we use the past 24 hours as the predictors.
#Creating recursive autoregressive model
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = make_pipeline(StandardScaler(), Ridge()),
lags = 24
)
forecaster.fit(y=data.loc[:end_validation, 'Demand'])
forecaster
Which produces:
#Creating recursive autoregressive model
# ==============================================================================
Regressor: Pipeline(steps=[('standardscaler', StandardScaler()), ('ridge', Ridge())])
Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
Window size: 24
Included exogenous: False
Type of exogenous variable: None
Exogenous variables names: None
Training range: [Timestamp('2018-01-01 00:00:00'), Timestamp('2021-11-30 23:00:00')]
Training index type: DatetimeIndex
Training index frequency: H
Regressor parameters: {'standardscaler__copy': True, 'standardscaler__with_mean': True, 'standardscaler__with_std': True, 'ridge__alpha': 1.0, 'ridge__copy_X': True, 'ridge__fit_intercept': True, 'ridge__max_iter': None, 'ridge__normalize': False, 'ridge__random_state': None, 'ridge__solver': 'auto', 'ridge__tol': 0.001}
Creation date: 2022-04-29 11:00:49
Last fit date: 2022-04-29 11:00:49
Skforecast version: 0.4.3
Next we move on to estimating how the model would have behaved if it had been trained with the data from 2018-01-01 00:00:00 to 2021-11-30 23:00:00 and then the following 24 hours were predicted which is evaluated. This type of evaluation, known as Backtesting, can be easily implemented. Alongside the predictions, we also receive an error metric for the backtesting process.
# Backtest
# =====================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data.Demand,
initial_train_size = len(data.loc[:end_validation]),
steps = 24,
metric = 'mean_absolute_error',
verbose = True
)
# Temporal index is added to predictions
predictions = pd.Series(data=predictions, index=data[end_validation:].index)
fig, ax = plt.subplots(figsize=(12, 3.5))
data.loc[predictions.index, 'Demand'].plot(ax=ax, linewidth=2, label='real')
predictions.plot(linewidth=2, label='prediction', ax=ax)
ax.set_title('Prediction vs real demand')
ax.legend();
plt.show()

# Backtest error
# ==============================================================================
print(f'Backtest error: {metric}')
Backtest error: 1160.5073687285353
However, as can be seen above we used a particular number of lags (24) and used a ridge model, however, whether this is the best approach remains to be seen. We can improve this.
In order to identify the best combination of lags and hyperparameters, a Grid Search with validation by Backtesting is used. Here we will training a model with different combinations of lags and hyperparameters and evaluate the predictive capacity. Here we must remain aware that we must only use the validation data and not the test data which we will need to evaluate the final model.
# Tuning the hyperparameters
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = Ridge(normalize=True),
lags = 24 # This value will be replaced in the grid search
)
param_grid = {'alpha': np.logspace(-3, 3, 10)}
lags_grid = [5, 24, [1, 2, 3, 23, 24, 25, 47, 48, 49]]
results_grid = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'Demand'],
param_grid = param_grid,
lags_grid = lags_grid,
steps = 24,
metric = 'mean_absolute_error',
initial_train_size = len(data[:end_train]),
return_best = True,
verbose = False
)

# Inspecting the results grid
# ==============================================================================
results_grid
lags ... alpha
20 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 0.001000
21 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 0.004642
22 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 0.021544
10 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 0.001000
11 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 0.004642
12 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 0.021544
23 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 0.100000
13 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 0.100000
14 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 0.464159
24 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 0.464159
15 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 2.154435
25 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 2.154435
0 [1, 2, 3, 4, 5] ... 0.001000
1 [1, 2, 3, 4, 5] ... 0.004642
3 [1, 2, 3, 4, 5] ... 0.100000
2 [1, 2, 3, 4, 5] ... 0.021544
16 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 10.000000
4 [1, 2, 3, 4, 5] ... 0.464159
26 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 10.000000
5 [1, 2, 3, 4, 5] ... 2.154435
17 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 46.415888
27 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 46.415888
6 [1, 2, 3, 4, 5] ... 10.000000
18 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 215.443469
28 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 215.443469
7 [1, 2, 3, 4, 5] ... 46.415888
19 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... ... 1000.000000
29 [1, 2, 3, 23, 24, 25, 47, 48, 49] ... 1000.000000
8 [1, 2, 3, 4, 5] ... 215.443469
9 [1, 2, 3, 4, 5] ... 1000.000000
Now that we have identified the best model, lets deploy it and see how it looks with the data.
# Forecasting with the best model
# ==============================================================================
forecaster
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data.Demand,
initial_train_size = len(data[:end_validation]),
steps = 24,
metric = 'mean_absolute_error',
verbose = True
)
# Datetime index added
predictions = pd.Series(data=predictions, index=data[end_validation:].index)
fig, ax = plt.subplots(figsize=(12, 3.5))
data.loc[predictions.index, 'Demand'].plot(linewidth=2, label='real', ax=ax)
predictions.plot(linewidth=2, label='prediction', ax=ax)
ax.set_title('Prediction vs real demand')
ax.legend();
plt.show()
print(f'Backtest error: {metric}')
Backtest error: 1094.662270927795 # Marginal improvement on previous backtest error.

Perhaps however there is an easier way of visualising the relative improvement. In order to do this I made a gif, which highlights the changes in the model, while it is a slight improvement, it is evident.

Next we will turn to make predictions for the future changes in electricity demand. In this we will use some new exogenous variables which we created.
# Create and train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = Ridge(alpha=0.00464, normalize=True),
lags = [1, 2, 3, 23, 24, 25, 47, 48, 49],
)
exog = [column for column in data.columns if column.startswith(('week_day', 'hour', 'Holiday'))]
forecaster.fit(y=data.Demand[: end_validation], exog=data[exog][: end_validation])
# Backtest
# ==============================================================================
predictions = backtest_predict_next_24h(
forecaster = forecaster,
y = data.loc[end_validation:, 'Demand'],
exog = data.loc[end_validation:, exog],
hour_init_prediction = 11,
verbose = False
)
Backtesting starts at day: 2021-12-03 11:00:00
Days predicted in the backtesting: ['2021-12-04' '2021-12-05' '2021-12-06' '2021-12-07' '2021-12-08'
'2021-12-09' '2021-12-10' '2021-12-11' '2021-12-12' '2021-12-13'
'2021-12-14' '2021-12-15' '2021-12-16' '2021-12-17' '2021-12-18'
'2021-12-19' '2021-12-20' '2021-12-21' '2021-12-22' '2021-12-23'
'2021-12-24' '2021-12-25' '2021-12-26' '2021-12-27' '2021-12-28'
'2021-12-29' '2021-12-30']
# Next we get the backtest error
# ==============================================================================
'
error = mean_absolute_error(
y_true = data.loc[predictions.index, 'Demand'],
y_pred = predictions
)
print(f"Backtest error: {error}")
Backtest error: 1074.326777202058
