Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views2 pages

ML - Assignment Advanced

A sample of advanced machine learning assignment.

Uploaded by

Jutt Sahib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

ML - Assignment Advanced

A sample of advanced machine learning assignment.

Uploaded by

Jutt Sahib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment: End-to-End Machine

Learning Pipeline
Objective
Apply everything you have learned so far to build a complete machine learning pipeline —
from raw data to model evaluation.

Dataset
Download the dataset from Kaggle. Don’t use any built-in library datasets, such as those
from scikit-learn or seaborn.

🔹 Assignment Tasks

1. Data Handling (NumPy & Pandas)


• Load dataset into a Pandas DataFrame.
• Perform initial checks.
• Handle missing values and duplicates.
• Convert categorical features into numerical form if needed.

2. Exploratory Data Analysis (EDA)


• Use NumPy & Pandas for basic statistics.
• Visualize data using:
- Matplotlib / Seaborn
- Plotly: at least one interactive plot (e.g., scatter or bar chart).

3. Feature Engineering
• Split dataset into features (X) and target (y).
• Normalize/scale data if necessary.
• Perform train-test split.

4. Model Training
• Train the following models:
• - KNN Classifier
• - Decision Tree Classifier
• - Random Forest Classifier
• Compare baseline results.
5. Feature Importance
• Extract and visualize feature importance from Random Forest.
• Discuss which features contribute most to predictions.

6. Hyperparameter Tuning
• Use RandomizedSearchCV to optimize hyperparameters:
• - KNN → n_neighbors, weights, metric
• - Decision Tree → max_depth, min_samples_split
• - Random Forest → n_estimators, max_depth, min_samples_split
• Compare default vs tuned models.

7. Model Evaluation
• Evaluate models using:
• - Accuracy
• - Precision, Recall, F1-score
• - Confusion Matrix
• Plot ROC Curve for the best-performing model.

8. Conclusion
• Which model performed best and why?
• Which features were most important?
• How did hyperparameter tuning improve results?

Deliverables
1. Jupyter Notebook with well-commented code and results.
2. Report (2–3 pages) summarizing:
- Dataset insights
- Visualization findings
- Model comparison table
- Key conclusions

You might also like