Creation of a flight tracker app as part of the datascientest training project (dataengineer bootcamp training by datascientest, june 2022) Team : Houda EL-MIR, Yacine AMESROUY & Alban DAVID
We collected our data with two API:
- Lufthansa API
- Airlabs API
As we can see on this schema, there are several steps :
- Collecting data with Airlibas and Lufthansa API
- All of these raw data are stored in a amazon cloud bucket (S3)
- Why ? It'ss like a datalake and allow us to keep our raw data in a common place.
- After, these data are transferred to a nosql mongodb database (stored on a cloud platform)
- Why ? The raw data are in a json format. It's an easily usable format on mongo db. Once the data are on mongo db, we can easily transform them into a pandas dataframe to clean our data.
- Once data are cleaned, we inject them into an sql database (cloud mysql with aws rds)
- Why ? Regular update of real times flight informations are like transactions. We needed constraints and a strict schema. This allows that the final informations on the dashboard are always consistent. Futhermore, the SQL modelisation were challenging and led us to understand deeply the data and what we want to do with it in the end use.
- We use an API to display our data (Dash and plotly)
Note : All the databases (S3, MongoDB and SQL) are in the cloud.
There is our sql schema modelization :
####### todo
Detailed information can be found here
Detailed information can be found here
We did unit testing through all of our steps:
-
API To see the file in detail, click here
-
S3 + MONGO DB + CLEAN DATA
-
SQL AND DASHBOARD
Two tests are executed before the execution of the dash:
- Verify if the connection to the databse (mysql in aws) works fine (password, username and database's name are correct)
- Verify that fetching data to this database works fine
To see the file in detail, click here
A complete documentation of our airflow is available here
All our unit tests described above are launched at each push. For this, we have created a git hub action which launches all the pytest files.
The github action file is here
- Deploy the app in production