Data Analytics pipeline using Apache Spark | Build multi-class classification models | Test the model using test data and compute accuracy of each method
-
Updated
Oct 14, 2018 - Python
Data Analytics pipeline using Apache Spark | Build multi-class classification models | Test the model using test data and compute accuracy of each method
This project builds a parallel ML app to predict wine quality using Apache Spark's MLlib on AWS. It involves training on 4 EC2 instances, validating and optimizing the model, and testing its F1 score. The app is packaged in Docker for deployment, showcasing Spark's scalability and AWS's distributed computing power.
An Image classification approach using BigData (spark) machine learning and data manipulation libraries.
Add a description, image, and links to the mlib topic page so that developers can more easily learn about it.
To associate your repository with the mlib topic, visit your repo's landing page and select "manage topics."