MLFlowInferenceFromSpark

The project describes various ways of doing MLFlow model inference(batch and streaming) using CDP Spark form factors (CML, CDE and Datahub)

The goal of this project is to demonstrate Machine Learning Inference capabilities specifically with MLFlow models with Cloudera CDP CML(Cloudera Machine Learning - ML with Spark on K8S), CDE(Cloudera Data Engineering - Spark3 on K8S)--TBD and Datahub(Spark3 on yarn)--TBD

CML - Cloudera Machine Learning

Step-0:

Make sure you have a running CDP environment(AWS or Azure)
Create a Workspace in CDP CML (AWS or Azure)
Within the CML workspace, create a Project and clone this github project

Step-1:

Project artifacts explained:

   0_Setup.py -- Python script for setting up environment variables and create some required directories
   
   1_MLFlow_Tensorflow_Batch_Inference.ipynb -- Jupyter notebook for Spark batch MLFlow UDF based inference
   
   1_SparkStreaming_MLFlow_Model_Inference.ipynb -- Jupyter notebook for Spark Streaming based MLFlow UDF based inference over incoming files in either S3 bucket or ABFS container
   
   gen_files.sh -- Bash script to simulate incoming files for Spark Streaming
   
   requirements.txt -- Python dependencies required to run this project
   
   model - Model directory that hold MLFlow artifacts (The model was prebuilt for inference here)

Step-2:

Create a new session based on Workbench editor
Execute the 0_Setup.py script by running all lines
Stop the session
Create a new session based on Jupyter notebook
Open a terminal and execute the command pip install -r requirements.txt
The above command should install all the dependencies you need to run batch inference and streaming inference using Spark Streaming

Step-3: MLFlow model inference for batch

Open the 1_MLFlow_Tensorflow_Batch_Inference.ipynb notebook from the left panel.The notebook has code that generates input data
Execute the notebook and you should see a dataframe printed on to the console with model predictions

Step-4: MLFlow model inference with Spark Streaming

Open the 1_SparkStreaming_MLFlow_Model_Inference.ipynb notebook from the left panel.
Run the notebook and wait for few seconds until the kernel is idle
The Spark Streaming app is running and it is waiting for input files to arrive into your object store folder/container
Open the terminal and run the gen_files.sh script bash gen_files.sh 100
The above command takes one input which is the number of files. You can change the number to suit to your requirements.
As the script is running, you can observe the Spark Streaming app from the notebook will start processing those files into the $STORAGE/output folder of your object store bucket

Step-5: Cleanup

Stop your running sessions
Delete your $STORAGE/tmp, $STORAGE/chkpnt and $STORAGE/output directories

TBD

MLOps work to run this code in CDE and Datahub

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
mlmodel		mlmodel
model		model
test_model.onnx.mlflow		test_model.onnx.mlflow
0_Setup.py		0_Setup.py
1_MLFlow_Tensorflow_Batch_Inference.ipynb		1_MLFlow_Tensorflow_Batch_Inference.ipynb
1_SparkStreaming_MLFlow_Model_Inference.ipynb		1_SparkStreaming_MLFlow_Model_Inference.ipynb
Inference_data.ipynb		Inference_data.ipynb
README.md		README.md
cdsw-build.sh		cdsw-build.sh
gen_files.sh		gen_files.sh
model.zip		model.zip
model_wrapper.py		model_wrapper.py
requirements.txt		requirements.txt
winequalityN.csv		winequalityN.csv
wrapper.ipynb		wrapper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLFlowInferenceFromSpark

The goal of this project is to demonstrate Machine Learning Inference capabilities specifically with MLFlow models with Cloudera CDP CML(Cloudera Machine Learning - ML with Spark on K8S), CDE(Cloudera Data Engineering - Spark3 on K8S)--TBD and Datahub(Spark3 on yarn)--TBD

CML - Cloudera Machine Learning

Step-0:

Step-1:

Step-2:

Step-3: MLFlow model inference for batch

Step-4: MLFlow model inference with Spark Streaming

Step-5: Cleanup

TBD

About

Uh oh!

Releases

Packages

Languages

hrongali/MLFlowInferenceFromSpark

Folders and files

Latest commit

History

Repository files navigation

MLFlowInferenceFromSpark

The goal of this project is to demonstrate Machine Learning Inference capabilities specifically with MLFlow models with Cloudera CDP CML(Cloudera Machine Learning - ML with Spark on K8S), CDE(Cloudera Data Engineering - Spark3 on K8S)--TBD and Datahub(Spark3 on yarn)--TBD

CML - Cloudera Machine Learning

Step-0:

Step-1:

Step-2:

Step-3: MLFlow model inference for batch

Step-4: MLFlow model inference with Spark Streaming

Step-5: Cleanup

TBD

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages