NFInsight Analytics

This is the repo for NFInsight's ETL server.

This repo consists of the Docker application code needed to run the ETL layer, in the above diagram.

In the future, we hope to incorporate some analytics workflows with these Cassandra clusters, and with Tensorflow and Spark.

Developed by @SeeuSim and @JamesLiuZx

Set up

To run, simply follow these steps:

Clone the repo.
Create a virtual environment within the directory. We recommend using Python 3.9.
Activate the virtual environment.
Grab your API connection strings and populate it in a .env file in the /etl/fastapi_app folder.
- You may use the .env.example as a guideline.
- You should also generate an app secret for authenticating JWT tokens.

Additional care needs to be taken when setting up the database. Please refer to etl/database.py:

For Datastax Astra: (You may also refer to Astra set up)
- It uses a zip file which we store the in etl folder that is referenced in etl/database.py. To get this package, go to your Datastax Astra console and grab the "connection bundle" and save it in the etl folder.
- For its other variables, populate them according the the etl/.env.example file.
For Azure Cosmos:
- The code in database.py will need to be modified, as well as the environment variables needed.
- This applies for other Cassandra clusters as well.

Using your Cassandra CQL shell, create all the tables in the etl/celery_app/db/models.py with those commands.
Insert an admin user with username and bcrypt hashed password into the admin_user table.

6.1. Generate the password with Python:

from passlib.Context import CryptContext

context = CryptContext(schemes=['bcrypt'], deprecated='auto')

password="{PASSWORD}"
hashed_password=context.hash(password)

print(hashed_password)
# >> 'hashed password'

6.2. Insert the user into your database with its CQL shell.

INSERT INTO admin_user (username, hashed_password, disabled)
VALUES ('{username}', '{hashed_password}', false);
-- >> Values inserted

Run these commands:

# Build the app from the local code.
docker compose up --build

The server should be up and running. To visit the OpenAPI spec, simply go to 127.0.0.1/docs in your browser.

To trigger authenticated routes, key in the credentials from step 6 and click authenticate in the OpenAPI spec.
To spin down the server, simply run Cmd + C in the Docker Compose terminal.

Astra set up on local

You first need to create a Datastax Astra database, and navigate to your database admin console on the web.

To set up, simply download your secure-connect-{database_name} bundle and populate it in the etl folder. Reference that (including the database name) in the etl/database.py file by setting the variables in the etl/.env file:

ASTRA_DB_NAME="<value>"
ASTRA_CLIENT_ID="<value>"
ASTRA_CLIENT_SECRET="<value>"
ASTRA_TOKEN="<value>"

ASTRA_KEYSPACE="<keyspace>"

These values can be obtained from your Astra DB web console.

Database Introspection

The Python ORM by DataStax has some flaws. Hence, we execute our queries using only its raw CQL execution engine.

To connect to the database, create a Datastax Astra account and database, and use either their web CQL shell or execute raw queries with its various drivers.

Within the web shell, you should be able to test and execute queries using CQL.

Once your queries have been validated, use session.prepare and session.execute in your Python code to execute database statements.

CQL Injection

Within our code, there are multiple CQL injection vulnerabilities with raw f-string queries. However, as we are not storing sensitive data within the database and are optimising our queries for batch performance, we will leave them as such for now.

Fixes proposed are welcome, via our issues section.

Celery Demo

If you're wondering what Celery is, it is a backend tasks broker that can be used to run background tasks.

We've configured Celery in this project to use a RabbitMQ broker with py-amqp, and a Redis in-memory results backend that can be used for an access lock if needed.

Here's how to run the demo:

If you haven't already, ensure that your system has:

Docker installed, and that the Docker daemon is up and running. In OSes with GUI, you may simply launch the Docker Desktop client.

Ensure all your environment variables are set. You may follow the respective .env.example files.
Run this command in your terminal:

docker-compose up

Now, your app may call any function denoted with @app.task in app/celery.py. This should run in the background.
To illustrate, open a separate shell with the same venv activated, and run this:

# Start a REPL environment for testing
python3

>> from celery_app.celery import app
>> app.send_task('task_name', args=(...), kwargs={...})

You should be able to see the Celery worker handle and execute the task. In the future, we hope to be able to implement the necessary APIs to manage, start and stop tasks.

You may also run the script ./scripts/flower.sh in another terminal to see a GUI to view task running statuses at localhost:5556. Remember to run the same chmod command on the flower script. Alternatively, you can run find ./scripts -type f -exec chmod +x {} + to enable permissions for all current script files within the folder.
To spin down the celery app and related resources, perform these actions in this sequence:

Terminate the flower script by running Cmd+C in the flower.sh terminal.
Terminate the containers by running docker-compose down from another terminal.

Kubernetes

If Kubernetes is more your thing, we also provide a set up for kubernetes.

We also provide a Kubernetes workflow under k8s/Setup.md.

Pre-requisites: Modify the kubernetes scripts image tags for the celery deployment and the fastapi deployment to point to your local images that were previously built with docker compose.

They may be found under: ./k8s/resources/celery-worker-deployment.yaml and ./k8s/resources/fastapi-application-deployment.yaml.

Ensure that you have minikube and kubectl on your system.
Ensure that the Docker daemon is running.
Run the commands below:

#Start the local control plane with minikube
minikube start

# Create the necessary namespaces
kubectl apply -f k8s/namespace.yaml

# Create the resources
kubectl apply -f k8as/resources

# Mirror the ports
minikube tunnel

Now, you will be able to interact with the containers and the FastAPI application as if you were running docker-compose. To terminate, simply run kubectl delete -f k8s/resources.

NOTE: Do NOT delete the namespace.

Upcoming Features

Model Training Scripts
CI to build and deploy to ACA

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
etl		etl
k8s		k8s
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
celery.Dockerfile		celery.Dockerfile
docker-compose.yaml		docker-compose.yaml
fastapi.Dockerfile		fastapi.Dockerfile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NFInsight Analytics

Set up

Astra set up on local

Database Introspection

CQL Injection

Celery Demo

Kubernetes

Upcoming Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

SeeuSim/nfinsight_analytics

Folders and files

Latest commit

History

Repository files navigation

NFInsight Analytics

Set up

Astra set up on local

Database Introspection

CQL Injection

Celery Demo

Kubernetes

Upcoming Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages