CORSA - Course Data Analysis & Processing

CORSA is a Python-based data analysis tool designed for processing and analyzing university course information. The project focuses on scraping, cleaning, and analyzing course data from various sources, with specific support for AUA (American University of Armenia) course catalogs and general education requirements.

Features

Course Data Scraping: Extract course information from HTML sources and web APIs
Data Processing: Clean and structure course data using pandas for analysis
AI Integration: Leverage OpenAI API for intelligent data processing and analysis
Multiple Data Sources: Support for different course data formats (Jenzabar, GenEds, AUA)
Jupyter Notebook Analysis: Interactive data exploration and visualization

Project Structure

├── 1. Corsa_Jenza_F2025.ipynb      # Jenzabar course data processing
├── 2. Corsa_Geneds_F2025.ipynb     # General education courses analysis
├── 3. Corsa_AUA_Merged_F2025.ipynb # Combined AUA course data analysis
├── pyproject.toml                   # Project dependencies and configuration
├── .env.example                     # Environment variables template
├── .gitignore                       # Git ignore rules
└── README.md                        # This file

Requirements

Python 3.11+
Dependencies managed via uv (see pyproject.toml)

Core Dependencies

beautifulsoup4 - HTML parsing and web scraping
pandas - Data manipulation and analysis
requests - HTTP requests for data fetching
openai - OpenAI API integration
python-dotenv - Environment variable management
ipykernel - Jupyter notebook support

Installation

Clone the repository:
```
git clone <repository-url>
cd corsa
```
Install dependencies using uv:
```
source ./.venv/bin/activate && uv sync
```

Set up environment variables:

cp .env.example .env
# Edit .env and add your OpenAI API key

Configuration

Create a .env file in the project root with the following variables:

OPENAI_API_KEY=your_openai_api_key_here

Usage

Running Jupyter Notebooks

The project consists of three main analysis notebooks:

Jenzabar Course Processing (1. Coursa_Jenza_F2025.ipynb)
- Processes course data from Jenzabar HTML files
- Extracts course information, schedules, and metadata
- Uses OpenAI for intelligent data enhancement
Note: Before running this notebook, you must visit AUA SONIS Jenzabar, make sure that all courses are selected and visible for you using the course limit selector, and save the HTML contents of that page to a folder in this project called .localdata. The filename must be raw__jenzabar.html.
General Education Analysis (2. Coursa_Geneds_F2025.ipynb)
- Fetches and processes general education course requirements from AUA's official website
- Web scraping of course catalog data
- Data cleaning and structure standardization
AUA Merged Analysis (3. Coursa_AUA_Merged_F2025.ipynb)
- Combines data from multiple sources
- Comprehensive analysis of AUA course offerings
- Cross-references course data across different systems

Running the Notebooks

You can run the notebooks through your preferred IDE or the command line. I used Jupyter Lab extension for VS Code for running.

Data Sources

The project processes course data from multiple sources:

AUA SONIS Jenzabar System: HTML-based course catalog data
AUA General Education Website: Web-based course requirement data

All source data files are stored in .localdata/ directory (excluded from version control).

Output

The analysis generates:

Cleaned CSV files with structured course data
Data visualizations and summary statistics
AI-enhanced course descriptions and metadata
Cross-referenced course information across systems

The final generated outuput is going to be named aua__all-courses-merged.csv.

Development

Architecture

Notebooks: Interactive analysis and data processing workflows
Data: Raw and processed course data (in .localdata/)
Configuration: Environment-based settings and API keys

Adding New Data Sources

Create a new Jupyter notebook following the naming convention
Implement data extraction and cleaning logic
Standardize output format to match existing schemas
Update this README with new data source information

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions, issues, or contributions, please:

Check existing Issues for similar problems
Create a new issue with detailed information
Include relevant notebook outputs and error messages

Note: This project handles educational data. Please ensure compliance with institutional data policies and privacy requirements when using or contributing to this project. This project is not endorsed, sponsored, or affiliated with the American University of Armenia.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CORSA - Course Data Analysis & Processing

Features

Project Structure

Requirements

Core Dependencies

Installation

Configuration

Usage

Running Jupyter Notebooks

Running the Notebooks

Data Sources

Output

Development

Architecture

Adding New Data Sources

Contributing

License

Support

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.localdata		.localdata
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
1. Corsa_Jenza_F2025.ipynb		1. Corsa_Jenza_F2025.ipynb
2. Corsa_Geneds_F2025.ipynb		2. Corsa_Geneds_F2025.ipynb
3. Corsa_AUA_Merged_F2025.ipynb		3. Corsa_AUA_Merged_F2025.ipynb
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

mikayelgr/corsa

Folders and files

Latest commit

History

Repository files navigation

CORSA - Course Data Analysis & Processing

Features

Project Structure

Requirements

Core Dependencies

Installation

Configuration

Usage

Running Jupyter Notebooks

Running the Notebooks

Data Sources

Output

Development

Architecture

Adding New Data Sources

Contributing

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages