Thanks to visit codestin.com
Credit goes to github.com

Skip to content

KavinAravindhan/Privacy-Preserving-Clustering

Repository files navigation

Privacy-Preserving Social Network Clustering Using Differential Privacy

Build Status License Python

In the contemporary landscape of online social networks, preserving users' privacy while applying clustering techniques is a pivotal concern. This project integrates differential privacy into social network clustering to balance user privacy and clustering effectiveness. Through a detailed exploration of differential privacy parameters, this work provides insights into how privacy levels influence clustering accuracy and offers a comprehensive understanding of the relationship between privacy, data utility, and clustering in social networks.

Published

This research was published in the 2024 International Conference on Smart Systems for Electrical, Electronics, Communication, and Computer Engineering (ICSSEECC).

Key Features

  • Integration of differential privacy with social network clustering.
  • K-means clustering on a noisy feature matrix generated by Laplace noise.
  • Evaluation of epsilon parameter impacts on privacy and clustering performance.
  • Detailed graphical analysis and evaluation metrics.

Technologies / Libraries Used 🛠️

  • Python (3.8+)
  • numpy - Numerical computations
  • networkx - Graph and network analysis
  • matplotlib - Data visualization
  • sklearn - Machine learning tools
    • KMeans - Clustering algorithm
    • adjusted_rand_score - Evaluation metric
    • silhouette_score - Evaluation metric
    • davies_bouldin_score - Evaluation metric
  • Jupyter Notebook

Evaluation Metric Values 📈

  • Privacy Parameter (epsilon): Varied from 0.1 to 3 in increments of 0.1.
  • Optimal Epsilon Value: At an epsilon of 2.2, the clustering accuracy peaks at 80.53%, achieving a balance between privacy and effectiveness.
  • Detailed visualizations of the epsilon vs. metric relationship, highlighting the privacy-utility trade-offs.

Dataset 📊

  • Source: Twitter Social Network
  • Description: Contains user profiles, follower/friend lists, and interaction subgraphs.
  • Twitter Dataset

Installation

  1. Clone the repository:
    git clone https://github.com/KavinAravindhan/privacy-preserving-clustering.git
  2. Install the necessary libraries:
    pip3 install -r requirements.txt
  3. Run Jupyter notebooks as per the analysis workflow.

Files

  1. data_preprocessing.ipynb - Data preprocessing and cleaning.
  2. k-means_clustering.ipynb - Initial clustering on raw data.
  3. differential_privacy.ipynb - Adding differential privacy to data and clustering again.
  4. accuracy_metrics.ipynb - Evaluation of clustering results using various metrics.
  5. graphical_analysis.ipynb - Graphical analysis of metrics across different privacy parameter values.
  6. privacy_preserving_clustering.ipynb - Comprehensive notebook with all steps combined.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Team Acknowledgment 🙌

A special thanks to our amazing team for their dedication and hard work. Despite the challenges, their commitment to learning new technologies and collaborating effectively made this project a success.

About

Privacy-Preserving Social Network Clustering Using Differential Privacy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors