Thanks to visit codestin.com
Credit goes to github.com

Skip to content

datasilvia/Statistics-project

Repository files navigation

๐Ÿ“Š Statistics-project

Cover Image

GitHub commit activity GitHub forks GitHub stars GitHub issues GitHub pull requests GitHub license

๐ŸŽฏ Objectives

The objective of this project is to analyze the results of a digital experiment conducted by the Customer Experience (CX) team at Vanguard. The experiment aims to determine if a new user interface (UI) and in-context prompts improve the completion rate of the online process for clients.

โš™๏ธ Functionality

This project includes the following functionalities:

  • ๐Ÿ” Data exploration and cleaning: EDA and Data Cleaning
  • ๐Ÿ“Š Client behavior analysis
  • ๐Ÿ“ˆ Performance metrics evaluation
  • ๐Ÿงช Hypothesis testing: Assess the effectiveness of the redesign
  • ๐Ÿ”ฌ Experiment evaluation
  • ๐Ÿ“‰ Interactive data visualization: Using Tableau

๐Ÿ› ๏ธ Tools Used

The following tools and technologies were used to carry out this project:

  • ๐Ÿ Python: For data exploration and analysis
  • ๐Ÿ“Š Pandas: For data manipulation and cleaning
  • ๐Ÿ“‰ Matplotlib and Seaborn: For data visualization
  • ๐Ÿ“Š Tableau: For creating interactive visualizations
  • ๐Ÿ““ Jupyter Notebook: For documenting and presenting the analysis
  • ๐Ÿ™ GitHub: For version control and collaboration
  • ๐Ÿ“‹ Trello: For project management
  • ๐ŸŒ Streamlit: For building and deploying the interactive web application
  • ๐Ÿค– Scikit-learn: For implementing the Machine Learning model

๐Ÿš€ Development Process

๐Ÿงน Data Exploration and Cleaning:

  • Initial exploration of the datasets (df_final_demo, df_final_web_data, df_final_experiment_clients).
  • Data cleaning and resolving quality issues.

๐Ÿ“Š Client Behavior Analysis:

  • Demographic analysis of clients.
  • Analysis of client behavior during the online process.

๐Ÿ“ˆ Performance Metrics Evaluation:

  • Defining success indicators.
  • Evaluating the outcome of the redesign.

๐Ÿงช Hypothesis Testing:

  • Conducting hypothesis tests on the completion rate.
  • Evaluating the completion rate with a cost-effectiveness threshold.
  • Conducting other relevant hypothesis tests.

๐Ÿ”ฌ Experiment Evaluation:

  • Evaluating the design effectiveness.
  • Assessing the duration of the experiment.
  • Identifying additional data needs.

๐Ÿ“‰ Data Visualization with PowerBI:

  • Creating interactive visualizations in PowerBI.
  • Preparing dashboards for the presentation.

๐Ÿ“ˆ Results

The project results include:

  • A detailed analysis of client behavior and the effectiveness of the redesign.
  • Hypothesis tests supporting conclusions about the completion rate.
  • Interactive visualizations in PowerBI presenting the findings clearly and comprehensively.
  • A final report and presentation summarizing the results and recommendations.

๐Ÿ“‹ Trello Board

Our Trello board is an essential tool for managing the project's workflow and ensuring that all tasks are organized and tracked efficiently. It helps us to:

  • Plan: Outline the project's objectives, milestones, and deliverables.
  • Organize: Break down the project into manageable tasks and assign them to team members.
  • Track Progress: Monitor the status of each task, from to-do to in-progress to completed.
  • Collaborate: Facilitate communication and collaboration among team members by providing a centralized platform for updates and feedback.
  • Adapt: Adjust plans and priorities as needed based on the project's progress and any new insights or challenges that arise.

Here is a snapshot of our Trello board:

Trello Board

๐Ÿ—‚๏ธ Project Structure

The project is organized as follows:

  • ๐Ÿ“‚ analysis_of_clients/: Contains scripts and notebooks for client behavior analysis.
  • ๐Ÿงน cleaning/: Contains scripts for data cleaning and preprocessing.
  • ๐Ÿ–ผ๏ธ images/: Directory for storing image files used in the project.
  • ๐Ÿค– machine_learning/: Contains scripts and notebooks for machine learning models.
  • ๐Ÿ“Š powerbi/: Contains PowerBI files and reports.
  • ๐ŸŒ streamlit_app/: Contains the main Streamlit application and related assets.
    • app.py: The main application script.
    • ๐ŸŽฅ videos/: Directory for storing video files used in the app.
    • ๐Ÿ“‚ data/: Directory for storing data files used in the app.
  • ๐Ÿ“‰ visualization/: Contains scripts and notebooks for data visualization.
  • ๐Ÿšซ .gitignore: Specifies files and directories to be ignored by Git.
  • ๐Ÿ“œ LICENSE: The project license file.
  • ๐Ÿ“„ README.md: The project documentation file.
  • ๐Ÿ“‹ requirements.txt: Lists the Python dependencies required for the project.

๐ŸŒ Streamlit App

The Streamlit app provides an interactive interface for users to explore the project's results and make predictions using the Machine Learning model.

Features

  1. ๐Ÿ“‹ Navigation Menu:

    • ๐ŸŽฏ Objectives: Overview of the project's goals and functionalities.
    • ๐Ÿš€ Development Process: Detailed description of the steps taken during the project, including data exploration, cleaning, and analysis.
    • ๐Ÿ“Š Charts and Visualizations: Interactive visualizations created using Tableau.
    • ๐Ÿ“ˆ Results and Conclusions: Summary of the project's findings and recommendations.
    • ๐Ÿค– ML Prediction: Interface for making predictions using the Machine Learning model.
  2. ๐Ÿ“‰ Interactive Visualizations:

    • Users can explore various charts and graphs to understand the data and the impact of the redesign.
  3. ๐Ÿ”ฎ Machine Learning Prediction:

    • Users can input the session duration and select the variation group to get a prediction on whether the client will complete the process.

๐Ÿค– Machine Learning

In this section, we implemented a Machine Learning model to predict whether a client will complete the online process based on the duration of their session and the variation group they belong to.

Model Description

We used a RandomForestClassifier to build our predictive model. The steps involved in creating the model are as follows:

  1. ๐Ÿ”„ Data Preprocessing:

    • Converted the duration column to seconds.
    • Transformed the confirm column into a binary variable (1 if the process was completed, 0 otherwise).
    • Encoded the categorical variation column using LabelEncoder.
  2. ๐Ÿ” Feature Selection:

    • Selected duration_sec and variation_encoded as the features.
    • Used confirm_binary as the target variable.
  3. ๐Ÿ“ Data Scaling:

    • Scaled the features using StandardScaler to normalize the data.
  4. ๐Ÿง  Model Training:

    • Split the data into training and testing sets (80% training, 20% testing).
    • Trained the RandomForestClassifier on the training data.
  5. ๐Ÿ“Š Model Evaluation:

    • Evaluated the model using a confusion matrix and classification report to assess its performance.

๐ŸŒ Streamlit Integration

We integrated the model into a Streamlit app to allow users to input session duration and variation group, and receive a prediction on whether the client will complete the process.

๐Ÿ› ๏ธ How to Use

  1. ๐Ÿ“ Input: Enter the session duration in the format HH:MM:SS and select the variation group.
  2. ๐Ÿ”ฎ Prediction: Click the "Predict" button to get the prediction (Confirmed or Not Confirmed).

This model helps in understanding the factors that influence the completion rate of the online process and provides insights for improving the user experience.

Streamlit App

๐Ÿ“Š PowerBI

In this section, we present interactive visualizations created using PowerBI. These visualizations help in understanding the data and deriving insights to improve the user experience and completion rates.

Visualization 1 Visualization 2
PowerBI Visualization 1 PowerBI Visualization 2

๐Ÿ‘ฅ Project Members

Name Role Special Characteristic GitHub Profile
Silvia Alonso ๐Ÿง‘โ€๐Ÿ’ป Data Analyst ๐Ÿฅ‡ Expert in data wrangling Silvia Alonso
Juan Duran ๐Ÿง‘โ€๐Ÿ’ป Data Analyst ๐ŸŒ Skilled in Streamlit Juan Duran
Ana Pineda ๐Ÿง‘โ€๐Ÿ’ป Data Analyst ๐Ÿ† Spanish Excel Champion Ana Pineda
Andrea Lafarga ๐Ÿง‘โ€๐Ÿ’ป Data Analyst ๐Ÿ“Š Expert in data management Andrea Lafarga

๐Ÿค Collaborations and Suggestions

We welcome collaborations and suggestions! Feel free to open an issue or submit a pull request. ๐Ÿš€

Thank you for taking the time to explore our project. We hope you find it useful and informative. Your feedback and contributions are invaluable to us, and we look forward to working together to improve and expand this project. ๐Ÿ™Œ

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


Thank you for visiting our repository! If you have any questions or need further assistance, please don't hesitate to reach out. Happy coding! ๐Ÿ˜Š


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •