This project is a data analytics study of demographics data for customers of a mail-order sales campaign in Germany. Data cleaning, preprocessing, transformation, unsupervised and supervised learning techniques were applied for this study. Unsupervised learning is used for customer segmentation purpose, and supervised modeling is used to predict the likelihood of individuals becoming customers for the mail-order campaign.
Data Files:
./data/Udacity_AZDIAS_052018.csv./data/Udacity_CUSTOMERS_052018.csv./data/Udacity_MAILOUT_052018_TRAIN.csv./data/Udacity_MAILOUT_052018_TEST.csv./data/AZDIAS_Attributes_Info.csvIn addition to the data files, the project workspace includes five files:Arvato_Project_workbook.ipynbis the jupyter notebook which documents the project steps.DIAS Information Levels - Attributes 2017.xlsxcontains Attribute description for the four main data files.DIAS Attributes - Values 2017.xlsxAttribute value explanation for each attribute, and it could help for data preprocessing steps.Project_final_report.pdf.README.mdprovides instructions on the project
numpy
pandas
matplotlib
seaborn
scikit-learn
joblib
- Udacity+Arvato: Identify Customer Segments
- I was ranked the 42nd spot (public score: 0.80153, the 1st place score: 0.81063)
I would like to thank Udacity for this project, and Arvato for providing the dataset.