Thanks to visit codestin.com
Credit goes to github.com

Skip to content

🚀MY EDA Helper is a lightweight Python package designed to speed up Exploratory Data Analysis (EDA) for machine learning projects. It provides easy-to-use functions for summarizing datasets, detecting missing values, and visualizing distributions—helping you gain insights quickly.

License

Notifications You must be signed in to change notification settings

shemanto27/eda-helper-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

My EDA Helper - Boost Your Exploratory Data Analysis Process! 🚀

PyPI Version

Python Version

License

Contributions Welcome

EDA Helper is a Python package designed to streamline your Exploratory Data Analysis (EDA) process. It provides a collection of helper functions to quickly analyze, visualize, and summarize datasets. Whether you're working with numeric, categorical, or datetime data, this package has you covered!


Credits 🙏

This package is inspired by the brilliant work of @MisbahullahSheriff. The original EDA helper functions were created by him, and I have extended and organized them for easier use. Additional functions and improvements have been added by me (@shemanto27).


Installation 📦

You can install the package via pip:

pip install my_eda_helper

For Google Colab users, install it directly in your notebook:

!pip install my_eda_helper

Usage 🛠️

1. Import the Package

import my_eda_helper as eda

2. High-Level Analysis

Missing Data

Find Missing Values:

missing_data = eda.missing_info(df)
print(missing_data)

Plot Missing Data:

eda.plot_missing_info(df)

Correlation Analysis

Numeric Features (Pearson/Spearman):

eda.correlation_heatmap(df)

Categorical Features (Cramer's V):

eda.cramersV_heatmap(df)

Pair Plots

eda.pair_plots(df)

3. Detailed Analysis

Numeric Features

Summary:

eda.num_summary(df, "Age")

Univariate Plots:

eda.num_univar_plots(df, "Fare")

Bivariate Plots:

eda.num_bivar_plots(df, "Age", "Fare")

Categorical Features

Summary:

eda.cat_summary(df, "Sex")

Univariate Plots:

eda.cat_univar_plots(df, "Embarked")

Bivariate Plots:

eda.num_cat_bivar_plots(df, "Fare", "Sex")

Hypothesis Testing

Numeric vs Numeric:

eda.num_num_hyp_testing(df, "Age", "Fare")

Numeric vs Categorical:

eda.num_cat_hyp_testing(df, "Fare", "Sex")

Categorical vs Categorical:

eda.hyp_cat_cat(df, "Sex", "Survived")

Contributing 🤝

Contributions are welcome! If you have ideas for new features, improvements, or bug fixes, please feel free to:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature/YourFeatureName
  3. Commit your changes:
    git commit -m 'Add some feature'
  4. Push to the branch:
    git push origin feature/YourFeatureName
  5. Open a pull request.

Please ensure your code follows the project's style and includes appropriate tests.


License 📄

This project is licensed under the MIT License. See the LICENSE file for details.


Support 💬

If you have any questions, suggestions, or issues, please open an issue on the GitHub repository.

Happy EDA! 🎉

About

🚀MY EDA Helper is a lightweight Python package designed to speed up Exploratory Data Analysis (EDA) for machine learning projects. It provides easy-to-use functions for summarizing datasets, detecting missing values, and visualizing distributions—helping you gain insights quickly.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages