This is a Streamlit web application that allows users to train a Linear Regression model with customizable data preprocessing, model selection, and hyperparameter tuning. It provides interactive visualizations and performance metrics to help users understand their data and model.
- CSV File Upload: Accept
.csvfile uploads. - Robust CSV Parsing: Users can specify CSV delimiter (comma, semicolon, tab) and encoding (UTF-8, Latin-1, ISO-8859-1).
- Missing Value Strategies: Choose from 'Drop Rows', 'Mean Imputation', or 'Median Imputation'.
- Outlier Handling: Option to 'Remove Outliers (IQR)' for numerical columns.
- Data Preview: Show a preview of the first few rows of the uploaded dataset.
- Optional Visualizations: Users can select to view:
- Outlier Visualizations (Box Plots): Box plots for selected numeric columns to identify outliers.
- Feature Distribution Plots (Histograms): Histograms for selected numeric columns to understand data distribution.
- Correlation Matrix Heatmap: A heatmap showing the correlation between numeric columns.
- Auto-detection: Automatically detects numeric and categorical columns.
- User Selection: Allows users to select:
- One or more feature columns (X) (numeric and one-hot encoded categorical).
- One target column (y) (numeric only).
- Categorical Feature Handling: Selected categorical columns are automatically one-hot encoded.
- Model Selection: Choose between 'Linear Regression', 'Ridge', and 'Lasso' models.
- Hyperparameter Tuning: For Ridge and Lasso, a slider is available to adjust the
alpharegularization parameter. - Cross-Validation: Implements k-fold cross-validation (default k=5) for robust performance evaluation.
- Performance Metrics: Displays key metrics in a table format:
- Coefficients
- Intercept
- R² score
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Cross-Validation R² (Mean and Standard Deviation)
- Visualizations:
- Predicted vs Actual Values Plot: Scatter plot with a regression line.
- Residual Plot: Scatter plot of residuals vs predicted values.
- Feature Importance Bar Chart: Bar chart showing the magnitude of coefficients for each feature.
- Download Trained Model: Download the trained model as a
.pklfile. - Download Predictions: Download the model's predictions on the test set as a
.csvfile.
- Frontend/UI: Streamlit
- Backend/Logic: Python
scikit-learnfor machine learning models.Pandasfor data manipulation.Plotly Expressfor interactive visualizations.NumPyfor numerical operations.statsmodelsfor trendline in plots.
-
Clone the repository:
git clone https://github.com/Mani212005/LRml.git cd LRml -
Create and activate a virtual environment:
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run app.py
This will open the application in your web browser.
- More advanced outlier detection methods.
- Support for other regression models (e.g., RandomForestRegressor, XGBoost).
- More sophisticated hyperparameter tuning (e.g., GridSearchCV, RandomizedSearchCV).
- Saving/loading models directly within the app.
- User authentication and persistent storage.