This project analyzes data from an online form used by a women's therapy group company. The form collects information about potential clients' problems, desired changes, and expectations from therapy. The analysis aims to understand the differences between respondents who attended informational sessions (indicating higher commitment) and those who did not.
The therapy group speaks is from Spain, so that's why all inputs are in spanish and the variables are in the same language. Also, at the end of the EDA+keyword_impact.ipynb, you will find a deep interpretation of the keywords impact scores in Spanish.
cleaning_typeform.ipynb: Data cleaning and preprocessing notebookEDA+keyword_impact.ipynb: Exploratory Data Analysis and NLP-based keyword impact analysisclean_typeform.csv: Cleaned and anonymized dataset (not included in repository for privacy reasons)Calendly 15 sp 2024.xlsx: Supplementary data for session attendance (not included in repository)
- Load raw data from the online form responses
- Clean and preprocess the data
- Anonymize personal information (names, emails, phone numbers)
- Merge additional data to track session completion
- Create new categorical variables for analysis
- Use NLTK for text processing
- Analyze key input fields:
- Main problem (problema_principal)
- Desired changes (cambios_deseados)
- Expectations from therapy (busqueda_en_terapia)
- Generate word clouds and frequency distributions of keywords
- Compare keyword usage between session attendees and non-attendees
- Calculate "impact scores" to identify significant differences in keyword usage between groups
The analysis reveals differences in language and focus between respondents who attended informational sessions and those who did not. Key findings are summarized in the EDA+keyword_impact.ipynb notebook, including:
- Most impactful keywords for each input field
- Differences in problem framing and goal setting between groups
- Insights into the motivations and expectations of potential clients
To reproduce the analysis:
- Ensure you have the required dependencies installed (see
requirements.txt) - Run
cleaning_typeform.ipynbto clean and preprocess the data - Run
EDA+keyword_impact.ipynbto perform the NLP analysis and generate insights
Note: The raw data files are not included in this repository to protect privacy. You'll need to supply your own data in a similar format to run the analysis.
Potential areas for expanding this analysis include:
- Sentiment analysis of responses
- Topic modeling to identify common themes
- Predictive modeling to forecast session attendance or client engagement