📊 FeatureClus: Feature Selection for Clustering Models

Welcome to FeatureClus, a Python library designed to simplify feature selection for clustering models. This tool helps you select the most relevant features that enhance clustering performance, ensuring you avoid the "curse of dimensionality" and make your clustering algorithms more efficient and interpretable. 🧠

🔍 How It Works

The feature selection process is driven by evaluating how each feature impacts the clustering results. FeatureClus uses an isolated data shift for each feature to assess its importance. The process follows these steps:

MinMaxScaler: First, we scale the features using MinMaxScaler to normalize the data.
PCA (80% variance): Next, we apply Principal Component Analysis (PCA) to reduce dimensionality, retaining 80% of the variance.
DBSCAN Clustering: After reducing the dimensionality, DBSCAN is used to perform clustering.
Silhouette Score Calculation: For each feature, we calculate the silhouette score to evaluate the quality of the clusters. The silhouette score represents how similar an object is to its own cluster compared to other clusters.
Data Shift and Feature Importance: By applying isolated shifts to each feature and recalculating the silhouette score, we measure how the score changes. The absolute difference in the silhouette score after shifting each feature is used to rank the features by importance.

This method ensures that the features are evaluated for their individual contribution to the clustering process, allowing you to focus on the most impactful features.

🚀 Key Features

🔍 Feature Ranking: Ranks features based on the absolute change in silhouette score after applying isolated shifts to each feature.
📈 Cluster Evaluation Metrics: Calculates the silhouette score to assess the clustering quality and the influence of each feature.
💻 Easy-to-Use API: A simple, intuitive API that can be easily integrated into your machine learning pipeline.

📦 Installation

To install the library, run the following command:

pip install featclus

📊 Example

Here is a quick example of how to use FeatureClus with a clustering algorithm (e.g., KMeans):

from featureclus import FeatureSelection
from sklearn.datasets import make_blobs

# Sample DataFrame
data, labels = make_blobs(n_samples=10000, centers=7, n_features=15, random_state=42)
df = pd.DataFrame(data, columns=[f"Feature_{i}" for i in range(15)])

# Initialize the FeatureSelection
model = FeatureSelection(data=df, shifts=[1, 25, 50, 75, 100], n_jobs=-1)

# See how the metrics are important
metrics2 = model2.get_metrics()

🛠️ Methods

`get_metrics()`

Returns metrics that assess how each feature contributes to clustering.

`plot_results(n_features)`

Selects the top n_features features based on their importance to clustering results.

☕ Support the Project

If you find this inventory optimization tool helpful and would like to support its continued development, consider buying me a coffee. Your support helps maintain and improve this project!

Other Ways to Support

⭐ Star this repository
🍴 Fork it and contribute
📢 Share it with others who might find it useful
🐛 Report issues or suggest new features

Your support, in any form, is greatly appreciated! 🙏

📝 License

This project is licensed under the MIT License. See the LICENSE file for more details.

Happy clustering! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 FeatureClus: Feature Selection for Clustering Models

🔍 How It Works

🚀 Key Features

📦 Installation

📊 Example

🛠️ Methods

`get_metrics()`

`plot_results(n_features)`

☕ Support the Project

Other Ways to Support

📝 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

sebassaras02/featclus

Folders and files

Latest commit

History

Repository files navigation

📊 FeatureClus: Feature Selection for Clustering Models

🔍 How It Works

🚀 Key Features

📦 Installation

📊 Example

🛠️ Methods

get_metrics()

plot_results(n_features)

☕ Support the Project

Other Ways to Support

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

`get_metrics()`

`plot_results(n_features)`

Packages