The objective of this project is to analyze the results of a digital experiment conducted by the Customer Experience (CX) team at Vanguard. The experiment aims to determine if a new user interface (UI) and in-context prompts improve the completion rate of the online process for clients.
This project includes the following functionalities:
- ๐ Data exploration and cleaning: EDA and Data Cleaning
- ๐ Client behavior analysis
- ๐ Performance metrics evaluation
- ๐งช Hypothesis testing: Assess the effectiveness of the redesign
- ๐ฌ Experiment evaluation
- ๐ Interactive data visualization: Using Tableau
The following tools and technologies were used to carry out this project:
- ๐ Python: For data exploration and analysis
- ๐ Pandas: For data manipulation and cleaning
- ๐ Matplotlib and Seaborn: For data visualization
- ๐ Tableau: For creating interactive visualizations
- ๐ Jupyter Notebook: For documenting and presenting the analysis
- ๐ GitHub: For version control and collaboration
- ๐ Trello: For project management
- ๐ Streamlit: For building and deploying the interactive web application
- ๐ค Scikit-learn: For implementing the Machine Learning model
- Initial exploration of the datasets (
df_final_demo,df_final_web_data,df_final_experiment_clients). - Data cleaning and resolving quality issues.
- Demographic analysis of clients.
- Analysis of client behavior during the online process.
- Defining success indicators.
- Evaluating the outcome of the redesign.
- Conducting hypothesis tests on the completion rate.
- Evaluating the completion rate with a cost-effectiveness threshold.
- Conducting other relevant hypothesis tests.
- Evaluating the design effectiveness.
- Assessing the duration of the experiment.
- Identifying additional data needs.
- Creating interactive visualizations in PowerBI.
- Preparing dashboards for the presentation.
The project results include:
- A detailed analysis of client behavior and the effectiveness of the redesign.
- Hypothesis tests supporting conclusions about the completion rate.
- Interactive visualizations in PowerBI presenting the findings clearly and comprehensively.
- A final report and presentation summarizing the results and recommendations.
Our Trello board is an essential tool for managing the project's workflow and ensuring that all tasks are organized and tracked efficiently. It helps us to:
- Plan: Outline the project's objectives, milestones, and deliverables.
- Organize: Break down the project into manageable tasks and assign them to team members.
- Track Progress: Monitor the status of each task, from to-do to in-progress to completed.
- Collaborate: Facilitate communication and collaboration among team members by providing a centralized platform for updates and feedback.
- Adapt: Adjust plans and priorities as needed based on the project's progress and any new insights or challenges that arise.
Here is a snapshot of our Trello board:
The project is organized as follows:
- ๐ analysis_of_clients/: Contains scripts and notebooks for client behavior analysis.
- ๐งน cleaning/: Contains scripts for data cleaning and preprocessing.
- ๐ผ๏ธ images/: Directory for storing image files used in the project.
- ๐ค machine_learning/: Contains scripts and notebooks for machine learning models.
- ๐ powerbi/: Contains PowerBI files and reports.
- ๐ streamlit_app/: Contains the main Streamlit application and related assets.
- app.py: The main application script.
- ๐ฅ videos/: Directory for storing video files used in the app.
- ๐ data/: Directory for storing data files used in the app.
- ๐ visualization/: Contains scripts and notebooks for data visualization.
- ๐ซ .gitignore: Specifies files and directories to be ignored by Git.
- ๐ LICENSE: The project license file.
- ๐ README.md: The project documentation file.
- ๐ requirements.txt: Lists the Python dependencies required for the project.
The Streamlit app provides an interactive interface for users to explore the project's results and make predictions using the Machine Learning model.
-
๐ Navigation Menu:
- ๐ฏ Objectives: Overview of the project's goals and functionalities.
- ๐ Development Process: Detailed description of the steps taken during the project, including data exploration, cleaning, and analysis.
- ๐ Charts and Visualizations: Interactive visualizations created using Tableau.
- ๐ Results and Conclusions: Summary of the project's findings and recommendations.
- ๐ค ML Prediction: Interface for making predictions using the Machine Learning model.
-
๐ Interactive Visualizations:
- Users can explore various charts and graphs to understand the data and the impact of the redesign.
-
๐ฎ Machine Learning Prediction:
- Users can input the session duration and select the variation group to get a prediction on whether the client will complete the process.
In this section, we implemented a Machine Learning model to predict whether a client will complete the online process based on the duration of their session and the variation group they belong to.
We used a RandomForestClassifier to build our predictive model. The steps involved in creating the model are as follows:
-
๐ Data Preprocessing:
- Converted the
durationcolumn to seconds. - Transformed the
confirmcolumn into a binary variable (1if the process was completed,0otherwise). - Encoded the categorical
variationcolumn usingLabelEncoder.
- Converted the
-
๐ Feature Selection:
- Selected
duration_secandvariation_encodedas the features. - Used
confirm_binaryas the target variable.
- Selected
-
๐ Data Scaling:
- Scaled the features using
StandardScalerto normalize the data.
- Scaled the features using
-
๐ง Model Training:
- Split the data into training and testing sets (80% training, 20% testing).
- Trained the
RandomForestClassifieron the training data.
-
๐ Model Evaluation:
- Evaluated the model using a confusion matrix and classification report to assess its performance.
We integrated the model into a Streamlit app to allow users to input session duration and variation group, and receive a prediction on whether the client will complete the process.
- ๐ Input: Enter the session duration in the format
HH:MM:SSand select the variation group. - ๐ฎ Prediction: Click the "Predict" button to get the prediction (
ConfirmedorNot Confirmed).
This model helps in understanding the factors that influence the completion rate of the online process and provides insights for improving the user experience.
In this section, we present interactive visualizations created using PowerBI. These visualizations help in understanding the data and deriving insights to improve the user experience and completion rates.
| Visualization 1 | Visualization 2 |
|---|---|
![]() |
![]() |
| Name | Role | Special Characteristic | GitHub Profile |
|---|---|---|---|
| Silvia Alonso | ๐งโ๐ป Data Analyst | ๐ฅ Expert in data wrangling | Silvia Alonso |
| Juan Duran | ๐งโ๐ป Data Analyst | ๐ Skilled in Streamlit | Juan Duran |
| Ana Pineda | ๐งโ๐ป Data Analyst | ๐ Spanish Excel Champion | Ana Pineda |
| Andrea Lafarga | ๐งโ๐ป Data Analyst | ๐ Expert in data management | Andrea Lafarga |
We welcome collaborations and suggestions! Feel free to open an issue or submit a pull request. ๐
Thank you for taking the time to explore our project. We hope you find it useful and informative. Your feedback and contributions are invaluable to us, and we look forward to working together to improve and expand this project. ๐
This project is licensed under the MIT License - see the LICENSE file for details.
Thank you for visiting our repository! If you have any questions or need further assistance, please don't hesitate to reach out. Happy coding! ๐





