This is a first year of master research project done by Mohamed Boulanouar, Maxime Thoor and Alexandre Verept, supervised by Kévin Hérissé (PhD student) at ISEN engineering school.
The goal of this project is to predict the air quality 1, 2 and 3 days in advance in Lille with the highest accuracy as possible.
In a way to accomplish this, we will use some machine learning and statistical techniques.
NB: Here you can find our first semester project, that consisted in the forecasting of the Air Index quality in Lille, based on only few data collected by a bee hive placed on the roof of the school. We had the chance to present our project in a public event organized by the MEL (Metropole Européenne de Lille).
See the result here ! (a bit empty for now as the project is not yet over)
Here is the general structure of our project:
Data Collector:
-
This script is running in real time and collect all the data from different APIs, shape them if needed, then send the result to our
Backend APIto be stored in theDatabase.
Real-time prediction script:
- This is the script that run every day in order to make the prediction.
- It ask all the information needed to the
Data Collector.
Back and front end API:
-
Receive useful data from the
Data Collectorto store it in theData base. -
Provide data to our real time
Prediction script, receive the results of the predictions and store them in theData base. -
Used to consult freely our predictions stored in the
Database.- Note: our original plan was to create two distinct APIs so you could find some un-updated references to the
FrontendandBackend APIelsewhere on this git.
- Note: our original plan was to create two distinct APIs so you could find some un-updated references to the
Data Base:
-
MySQL database used to store all the data we need: the different open source datasets, predictions ...
Final display:
-
Please see the result here made in R with Shiny.
We have used and learned some technologies and tools during this project:
- Most of our scripts are running using the Google Cloud Platform, with a mySQL database and several appEngines.
- Our final visualization is made in R with Shiny.
- Both our APIs use Flask.
- We trained our models using Google Colab and Tensorflow.
In a student project, the most important thing is what we learn from it, what experience we get:
-
As mentioned above, we discovered a lot of technologies by ourselves to create this product, such as Google Platform for the hosting or Flask for the API.
-
We improved our skills with Keras and Tensorflow when it comes to recurrent neural network and architectures with several inputs and outputs.
-
As all the work was done within a Covid-19 context, we had to adapt our methods of teamworking, especially with the planning.
The datasets we use must be a real time data to be useful for the prediction, but we also need archived data over a long period of time in a way to create a training dataset for our predictive model.
Potential open data APIs to exploit :
| Name | Source | Description | Frequency | Time frame |
|---|---|---|---|---|
| Indice qualité de l'air | MEL | Air index quality. | Every day | Window of 5 years with the dataset the MEL send us and the public data online. |
| Données SYNOP Essentielles OMM | Météo France | Wide range of weather data including wind, pressure, humidity and temperature. | Every 3 hours | Since 1997. |
| Historique de l'indice Atmo | ATMO | Provide an index of the daily measures of NO2, O3, and PM10. | Every day | Since 2012. |