1.
Student Performance Prediction
Description: Predict students' final grades based on hours studied, attendance,
previous scores, etc.
CSV Sample Columns:
Name,Hours_Studied,Attendance,Previous_Grade,Assignments_Submitted,Final_Grade
ML Type: Regression
Use: LinearRegression, RandomForestRegressor
---
2. House Price Prediction
Description: Predict the price of houses based on area, bedrooms, location, etc.
CSV Sample Columns:
Area,Bedrooms,Bathrooms,Location,Age,Price
ML Type: Regression
Use: LinearRegression, XGBoost
---
📦 1. Install required libraries
Open your terminal or PyCharm terminal and run:
bash
pip install pandas scikit-learn xgboost matplotlib
---
🧠 2. Full Code: house_price_prediction.py
python
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
Step 1: Load Dataset
df = pd.read_csv("house_prices.csv") Ensure this CSV is in the same folder
Step 2: Explore Data (Optional)
print(df.head())
print("\nMissing values:\n", df.isnull().sum())
Step 3: Separate features and target
X = df.drop("Price", axis=1)
y = df["Price"]
Step 4: One-hot encode categorical variable (Location)
X_encoded = pd.get_dummies(X, columns=["Location"], drop_first=True)
Step 5: Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2,
random_state=42)
Step 6: Train Linear Regression Model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
Predict using Linear Regression
lr_predictions = lr_model.predict(X_test)
lr_mse = mean_squared_error(y_test, lr_predictions)
print("Linear Regression MSE:", lr_mse)
Step 7: Train XGBoost Model
xgb_model = XGBRegressor()
xgb_model.fit(X_train, y_train)
Predict using XGBoost
xgb_predictions = xgb_model.predict(X_test)
xgb_mse = mean_squared_error(y_test, xgb_predictions)
print("XGBoost Regression MSE:", xgb_mse)
Step 8: Visualization
plt.figure(figsize=(10, 5))
plt.plot(y_test.values[:50], label='Actual Price', marker='o')
plt.plot(lr_predictions[:50], label='LR Predicted Price', marker='x')
plt.plot(xgb_predictions[:50], label='XGB Predicted Price', marker='s')
plt.title("Actual vs Predicted House Prices")
plt.xlabel("Sample Index")
plt.ylabel("Price")
plt.legend()
plt.tight_layout()
plt.show()
---
📁 Make Sure
The file house_prices.csv (you [downloaded
here](sandbox:/mnt/data/house_prices.csv)) is in the same folder as your Python
file.
The dataset has at least these columns:
Area,Bedrooms,Bathrooms,Location,Age,Price
---
✅ Output:
MSE from both models
Line chart showing actual vs predicted house prices using both models
3. Employee Attrition Prediction
Description: Predict whether an employee will leave the company based on various
features.
CSV Sample Columns:
Age,Department,Years_at_Company,Job_Satisfaction,Salary,Attrition
ML Type: Classification
Use: LogisticRegression, RandomForestClassifier
---
4. Fake News Detection
Description: Classify news articles as fake or real.
CSV Sample Columns:
Title,Text,Label
ML Type: Text Classification
Use: CountVectorizer, NaiveBayes, LogisticRegression
---
5. Loan Eligibility Prediction
Description: Predict if a person is eligible for a loan based on income, age, loan
amount, etc.
CSV Sample Columns:
Gender,Married,Education,ApplicantIncome,LoanAmount,Loan_Status
ML Type: Classification
Use: DecisionTreeClassifier, SVM