This a project was made for Data Science For All: Colombia program of Correlation One in collaboration with : Camila Malagón, Yesid Rivera, Óscar Nieto, Didier Santander, Eduardo González, Hollman Báez.
An entire Data Science project was developed, generating a visualization dashboard built with Dash-Python for the forecast of energy in London for the coming months.
The data was obtained from the real project developed by UK Power Networks (Energy provider in the United Kingdom) in the page Kaggle.
Context
Energy is one of the main topics on the UN agenda
for the following years, to assure global accessibility
and reduce the related generation of pollution. According to the UN, energy currently provides 60%
of the greenhouse gas emissions, although 13% of the global population
has no access to electricity.
For these reasons, countries like the UK are making eorts to create public policies focused on
converting their current energy source to clean alternatives.
To understand the dynamics of residential energy consumption in large cities, in 2014, the UK
Government hired UK Power Networks for a project focused on collecting information about energy
production and consumption through smart meters installed in a selected group of London households.
This information is useful to determine the current residential sector energy consumption charac-
teristics. For UK Power Networks and the UK Government, it is important to know in detail the patterns of energy consumption in London's households
, to create strategies to ease the transition to clean energy sources
. `
This project is focused on providing relevant information to the public and private entities, such
as the government of the United Kingdom, London authorities, energy suppliers, network operators,
researchers, and in general players of the energy market about energy consumption patterns and
demand trends of London households to allow them to make better decisions in eciently planning
and operation of the electricity distribution networks, improving customer service and adopting of low
carbon strategies. Last but not least, this study can be used as a guide for other countries that want
to advance in the implementation of alternative energies.
Datafolio
The Datafolio is visual snapshot
of the data project.
Final Presentation
This video shows the aim to define, analyze and process
the information in this project.
Exploratory Data Analysis
EDA
was used to analyze and investigate data sets and summarize the main characteristics of this project, data visualization methods were employed .
Modeling
The Prophet Forecasting model Prophet is a time series forecasting model that is based on an additive model approach, where non-linear trends are fit with three main model components:
- Growth (or trend) g(t)
- Seasonality s(t)
- Holidays h(t)
- Error term is included to represent any changes which are not accommodated by the model 1
One can tune the trend and seasonality hyperparameters to fit the model as well as possible, changing its value using cross-validation. The forecasting is phrased as a curve-fitting task, with time as the only regressor, so the model is univariate. These components are combined in the following equation:
y(t) = g(t) + s(t) + h(t) + 𝝐t
This formulation is similar to a generalized additive model (GAM)
, a class of regression models with potentially non-linear smoothers applied to the regressors, that has the advantage of being flexible, accurate, fast to implement, and interpretable parameters2. In this case, Prophet has some advantages compared to other time series models, such as its capacity to handle seasonal variations, missing data, and outliers.
This model is an open-source tool provided by Facebook Inc.
through the prophet package, available in Python
and R
Implementation Prophet Forecasting model
The modeling process can be divided into three main steps: data preparation, hyperparameter tuning and fitting of the model, and cross-validation and forecasting.
In this case, the model was implemented using the aggregated daily energy consumption data and the national UK holidays data. For the hyperparameter tuning and the cross-validation
, the dataset was automatically split into training and testing periods on a rolling basis, according to a defined train period and a forecasting horizon, which were established as 540 and 180 days. For that reason, the data used to perform the forecasting later will be included into the training set, since random samples cannot be used in time series.
Fitting the model is a very straightforward process but some key hyperparameters were adjusted to optimize the model performance. We perform an iterative process to select which of all the hyperparameters were most likely to be tuned by comparing the MAPE
obtained by adjusting each individual hyperparameter with a baseline MAPE with a standard fitted model. The most relevant hyperparameters were the type of trend, its flexibility or the seasonality and its strength, so its values were optimized using the grid search method.
After the hyperparameter tuning and the after cross-validation we obtained the best performing model, which exhibits a MAPE of 1.357%. This model was used for the forecasting and the comparison with the other time series models.
Prophet model by Category
For a more in-depth analysis, the same procedure was applied to the aggregated data by ACORN categories
, obtaining the corresponding metrics and forecast. This gave us insights of the behavior
that the daily energy consumption has across the distinct ACORN groups
and its impact on the
performance of the model. Some of the fitted models are presented to compare their predicted values
to the observations
The described process of fitting the model was performed to the dataset of each category, obtaining the following metrics:
Category | MSE | MAE | MAPE |
---|---|---|---|
Comfortable Communities | 0.45 | 0.17 | 1.90 |
Rising Prosperity | 1.4 | 0.31 | 3.02 |
Affluent Achievers | 2.48 | 0.45 | 3.01 |
Financially Stretched | 0.5 | 0.19 | 2.17 |
Not Private Households | 45.78 | 1.89 | 15.06 |
Urban Adversity | 0.15 | 0.10 | 1.41 |
Both metrics and the plots show us a generally good response of the model across the diferent
categories, with a MAPE
in a range of 1% - 2.5%. However, in particular the Not Private Households
category shows a poor performance due to the high variation across the period, it makes harder to
take the accuracy of the predictions.
It’s possible to see that the energy demand will increase in the upcoming years, and die to the
average energy growth demand will increase at the seasons stands approximately 4% (according to the
prophet model), compared with previous years and it won’t be signifcantly diferent, so the number
of departments, categories and commercial growth over the median will be more.
Finally, computing the variable’s importance for doing the classification in the model, we identifed that the most important variables were: season, and population. We’d like to clarify that all the information was summarized just to have a general overview and take to most of the performance out of the model, to clear the bigger picture, and to stay tuned with the changes. It was also summed up to prevent the model to be over fitted.
Final Report
The final report presented in the project was the following file: