CORSA is a Python-based data analysis tool designed for processing and analyzing university course information. The project focuses on scraping, cleaning, and analyzing course data from various sources, with specific support for AUA (American University of Armenia) course catalogs and general education requirements.
- Course Data Scraping: Extract course information from HTML sources and web APIs
- Data Processing: Clean and structure course data using pandas for analysis
- AI Integration: Leverage OpenAI API for intelligent data processing and analysis
- Multiple Data Sources: Support for different course data formats (Jenzabar, GenEds, AUA)
- Jupyter Notebook Analysis: Interactive data exploration and visualization
├── 1. Corsa_Jenza_F2025.ipynb # Jenzabar course data processing
├── 2. Corsa_Geneds_F2025.ipynb # General education courses analysis
├── 3. Corsa_AUA_Merged_F2025.ipynb # Combined AUA course data analysis
├── pyproject.toml # Project dependencies and configuration
├── .env.example # Environment variables template
├── .gitignore # Git ignore rules
└── README.md # This file
- Python 3.11+
- Dependencies managed via
uv(seepyproject.toml)
beautifulsoup4- HTML parsing and web scrapingpandas- Data manipulation and analysisrequests- HTTP requests for data fetchingopenai- OpenAI API integrationpython-dotenv- Environment variable managementipykernel- Jupyter notebook support
-
Clone the repository:
git clone <repository-url> cd corsa
-
Install dependencies using uv:
source ./.venv/bin/activate && uv sync
-
Set up environment variables:
cp .env.example .env # Edit .env and add your OpenAI API key
Create a .env file in the project root with the following variables:
OPENAI_API_KEY=your_openai_api_key_hereThe project consists of three main analysis notebooks:
-
Jenzabar Course Processing (
1. Coursa_Jenza_F2025.ipynb)- Processes course data from Jenzabar HTML files
- Extracts course information, schedules, and metadata
- Uses OpenAI for intelligent data enhancement
Note: Before running this notebook, you must visit AUA SONIS Jenzabar, make sure that all courses are selected and visible for you using the course limit selector, and save the HTML contents of that page to a folder in this project called
.localdata. The filename must beraw__jenzabar.html. -
General Education Analysis (
2. Coursa_Geneds_F2025.ipynb)- Fetches and processes general education course requirements from AUA's official website
- Web scraping of course catalog data
- Data cleaning and structure standardization
-
AUA Merged Analysis (
3. Coursa_AUA_Merged_F2025.ipynb)- Combines data from multiple sources
- Comprehensive analysis of AUA course offerings
- Cross-references course data across different systems
You can run the notebooks through your preferred IDE or the command line. I used Jupyter Lab extension for VS Code for running.
The project processes course data from multiple sources:
- AUA SONIS Jenzabar System: HTML-based course catalog data
- AUA General Education Website: Web-based course requirement data
All source data files are stored in .localdata/ directory (excluded from version control).
The analysis generates:
- Cleaned CSV files with structured course data
- Data visualizations and summary statistics
- AI-enhanced course descriptions and metadata
- Cross-referenced course information across systems
The final generated outuput is going to be named
aua__all-courses-merged.csv.
- Notebooks: Interactive analysis and data processing workflows
- Data: Raw and processed course data (in
.localdata/) - Configuration: Environment-based settings and API keys
- Create a new Jupyter notebook following the naming convention
- Implement data extraction and cleaning logic
- Standardize output format to match existing schemas
- Update this README with new data source information
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions, issues, or contributions, please:
- Check existing Issues for similar problems
- Create a new issue with detailed information
- Include relevant notebook outputs and error messages
Note: This project handles educational data. Please ensure compliance with institutional data policies and privacy requirements when using or contributing to this project. This project is not endorsed, sponsored, or affiliated with the American University of Armenia.