... London Energy Consumption (SmartEnerx Dashboard) |

London Energy Consumption (SmartEnerx Dashboard)

This a project was made for Data Science For All: Colombia program of Correlation One in collaboration with : Camila Malagón, Yesid Rivera, Óscar Nieto, Didier Santander, Eduardo González, Hollman Báez.

An entire Data Science project was developed, generating a visualization dashboard built with Dash-Python for the forecast of energy in London for the coming months.
The data was obtained from the real project developed by UK Power Networks (Energy provider in the United Kingdom) in the page Kaggle.

Context

Energy is one of the main topics on the UN agenda for the following years, to assure global accessibility and reduce the related generation of pollution. According to the UN, energy currently provides 60% of the greenhouse gas emissions, although 13% of the global population has no access to electricity. For these reasons, countries like the UK are making eorts to create public policies focused on converting their current energy source to clean alternatives. To understand the dynamics of residential energy consumption in large cities, in 2014, the UK Government hired UK Power Networks for a project focused on collecting information about energy production and consumption through smart meters installed in a selected group of London households.

This information is useful to determine the current residential sector energy consumption charac- teristics. For UK Power Networks and the UK Government, it is important to know in detail the patterns of energy consumption in London's households, to create strategies to ease the transition to clean energy sources. ` This project is focused on providing relevant information to the public and private entities, such as the government of the United Kingdom, London authorities, energy suppliers, network operators, researchers, and in general players of the energy market about energy consumption patterns and demand trends of London households to allow them to make better decisions in eciently planning and operation of the electricity distribution networks, improving customer service and adopting of low carbon strategies. Last but not least, this study can be used as a guide for other countries that want to advance in the implementation of alternative energies.

Datafolio

The Datafolio is visual snapshot of the data project.

Final Presentation

This video shows the aim to define, analyze and process the information in this project.

Exploratory Data Analysis

EDA was used to analyze and investigate data sets and summarize the main characteristics of this project, data visualization methods were employed .

EDA Fullscreen

Modeling

The Prophet Forecasting model Prophet is a time series forecasting model that is based on an additive model approach, where non-linear trends are fit with three main model components:

  • Growth (or trend) g(t)
  • Seasonality s(t)
  • Holidays h(t)
  • Error term is included to represent any changes which are not accommodated by the model 1

One can tune the trend and seasonality hyperparameters to fit the model as well as possible, changing its value using cross-validation. The forecasting is phrased as a curve-fitting task, with time as the only regressor, so the model is univariate. These components are combined in the following equation:

y(t) = g(t) + s(t) + h(t) + 𝝐t

This formulation is similar to a generalized additive model (GAM), a class of regression models with potentially non-linear smoothers applied to the regressors, that has the advantage of being flexible, accurate, fast to implement, and interpretable parameters2. In this case, Prophet has some advantages compared to other time series models, such as its capacity to handle seasonal variations, missing data, and outliers.
This model is an open-source tool provided by Facebook Inc. through the prophet package, available in Python and R

Implementation Prophet Forecasting model

The modeling process can be divided into three main steps: data preparation, hyperparameter tuning and fitting of the model, and cross-validation and forecasting.
In this case, the model was implemented using the aggregated daily energy consumption data and the national UK holidays data. For the hyperparameter tuning and the cross-validation, the dataset was automatically split into training and testing periods on a rolling basis, according to a defined train period and a forecasting horizon, which were established as 540 and 180 days. For that reason, the data used to perform the forecasting later will be included into the training set, since random samples cannot be used in time series. Fitting the model is a very straightforward process but some key hyperparameters were adjusted to optimize the model performance. We perform an iterative process to select which of all the hyperparameters were most likely to be tuned by comparing the MAPE obtained by adjusting each individual hyperparameter with a baseline MAPE with a standard fitted model. The most relevant hyperparameters were the type of trend, its flexibility or the seasonality and its strength, so its values were optimized using the grid search method. After the hyperparameter tuning and the after cross-validation we obtained the best performing model, which exhibits a MAPE of 1.357%. This model was used for the forecasting and the comparison with the other time series models.

Prophet model by Category

For a more in-depth analysis, the same procedure was applied to the aggregated data by ACORN categories, obtaining the corresponding metrics and forecast. This gave us insights of the behavior that the daily energy consumption has across the distinct ACORN groups and its impact on the performance of the model. Some of the fitted models are presented to compare their predicted values to the observations

The described process of fitting the model was performed to the dataset of each category, obtaining the following metrics:

Category MSE MAE MAPE
Comfortable Communities 0.45 0.17 1.90
Rising Prosperity 1.4 0.31 3.02
Affluent Achievers 2.48 0.45 3.01
Financially Stretched 0.5 0.19 2.17
Not Private Households 45.78 1.89 15.06
Urban Adversity 0.15 0.10 1.41

Both metrics and the plots show us a generally good response of the model across the diferent categories, with a MAPE in a range of 1% - 2.5%. However, in particular the Not Private Households category shows a poor performance due to the high variation across the period, it makes harder to take the accuracy of the predictions. It’s possible to see that the energy demand will increase in the upcoming years, and die to the average energy growth demand will increase at the seasons stands approximately 4% (according to the prophet model), compared with previous years and it won’t be signifcantly diferent, so the number of departments, categories and commercial growth over the median will be more.

Finally, computing the variable’s importance for doing the classification in the model, we identifed that the most important variables were: season, and population. We’d like to clarify that all the information was summarized just to have a general overview and take to most of the performance out of the model, to clear the bigger picture, and to stay tuned with the changes. It was also summed up to prevent the model to be over fitted.

Final Report

The final report presented in the project was the following file:


  1. (Taylor & Letham, 2017) ↩︎

  2. (Menculini et al., 2021) Tools: SQL Management Studio, Anaconda, Jupyter Notebook
    Languajes: SQL,Python
    Python Libraries: Pandas, Numpy,Dash ↩︎

Andres Camilo Viloria Garcia
Data Scientist | Data Analyst