School of Technology Design and Computer Application
College of Technology
Bachelor of Technology
Information Technology
Semester: 6 Academic Year: 2024-2025
Course Artificial Intelligence with Course Code: 1010103322
Name: concepts of Machine
Learning & Deep Learning
Assignment 2 [Unit: 5,6]
Instructions
Each student/group will be assigned a unique dataset. The following tasks must be completed and
documented in the report:
1. Import the Dataset
● Load the dataset using appropriate Python libraries (pandas, tensorflow, sklearn, etc.).
● Display the first few rows and understand the dataset’s structure.
2. Data Visualization & Preprocessing
● Identify missing values and handle them appropriately.
● Perform exploratory data analysis (EDA) using matplotlib and seaborn.
● Check for class imbalances and outliers.
● Perform necessary feature scaling and encoding if required.
3. Feature Extraction
● Identify important features using correlation, mutual information, or PCA.
● Drop irrelevant or redundant features.
4. Train-Test Data Split
● Split the dataset into training and testing sets (e.g., 80-20 or 70-30 split).
● Use train_test_split() from sklearn.model_selection.
1
5. Model Selection
● Choose an appropriate machine learning or deep learning model.
● Justify your choice of model for the given dataset.
● Consider traditional ML models (SVM, Decision Trees, Random Forest, Logistic Regression)
and deep learning models (CNN, LSTMs, Transformers) where applicable.
6. Model Training
● Train the selected model on the training dataset.
● Use hyperparameter tuning (GridSearchCV, RandomizedSearchCV, etc.) to improve model
performance.
7. Model Evaluation
● Evaluate model performance using appropriate metrics:
○ Classification: Accuracy, Precision, Recall, F1-score, AUC-ROC
○ Regression: RMSE, MAE, R2-score
○ Time Series: MSE, Mean Absolute Percentage Error (MAPE)
● Visualize results using confusion matrix, ROC curves, or loss/accuracy plots.
8. Conclusion
● Interpret model performance.
● Suggest improvements and future enhancements.
● Compare different models (if applicable) and justify the best choice.
Datasets & Assignments
Each student/group will work on one of the following datasets:
1. Titanic Survival Prediction (Classification) - Kaggle Link
2. House Price Prediction (Regression) - Kaggle Link
3. IMDB Movie Reviews Sentiment Analysis (NLP) - tensorflow.keras.datasets.imdb
4. CIFAR-10 Image Classification (Computer Vision) - tensorflow.keras.datasets.cifar10
5. UCI Heart Disease Prediction (Medical Classification) - Kaggle Link
6. Retail Sales Forecasting (Walmart Sales Data) (Time Series) - Kaggle Link
7. Fake News Detection (NLP) - Kaggle Link
8. Credit Card Fraud Detection (Anomaly Detection) - Kaggle Link
9. Human Activity Recognition (HAR) with Smartphones (Classification) - Kaggle Link
10.Plant Seedlings Classification (Image Classification) - Kaggle Link
2
Submission Guidelines
● The assignment must be submitted in the form of a google colab file, convert that file into
PDF then take print out and submit it after midsem exam.
● A PDF summarizing the approach, results, and analysis must be included.
● Deadline for submission: [29/03/2025 Saturday].
Additional Resources
● Python Libraries: pandas, numpy, sklearn, tensorflow, matplotlib, seaborn
● Kaggle Datasets: https://www.kaggle.com/datasets
● Google Colab for running models online: https://colab.research.google.com
Need Help?
If you have any questions, feel free to reach out via email or during teaching hours at EA-601(Ms.
Purvi patel).
Good luck and happy coding!