📊 Yahoo Finance ETL to PostgreSQL

Automated, scalable, and modular ETL pipeline using yahooquery to extract pricing, financial statements, and fundamental data — all stored in a PostgreSQL database for easy querying and analysis.

🚀 Features

✅ Clean, plug-and-play ETL pipeline (3 segments: Pricing, Financials, Fundamentals)
📥 Automatically scrapes S&P 500 tickers (or lets you configure your own universe)
🧱 Creates and manages PostgreSQL database schema + tables
🗃️ Organized output directories, archiving logic and file handling
⚙️ Fully modular: update or extend segments easily
🔒 Secure .env config (example provided)

⚙️ Setup Instructions

1️⃣ Requirements

Python 3.9+
PostgreSQL (download here)
Libraries: see requirements.txt

2️⃣ Clone the Repo

git clone https://github.com/NPStraight2ThePoint/yahooquery-etl-postgresql-prod.git
cd yahooquery-etl-postgresql-prod

3️⃣ Set Up Environment

Create your .env file using the provided template:

cp .env.example .env

Edit .env with your local PostgreSQL credentials:

DB_HOST=localhost
DB_PORT=5432
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=yahooquery_db

4️⃣ Install Python Dependencies

We recommend using a virtual environment:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

🧱 Initial Setup (One-time)

Run:

python _1_run_setup.py

🛠️ Get Tickers

You have three flexible options for defining your ticker universe:

Run the script to auto-fetch the S&P 500: python _2_get_sp500_tickers.py
Manually replace the default ticker list in output/Static Data/Tickers.csv with your own list of tickers.
Edit the scraping logic inside _2_get_sp500_tickers.py to adapt it to other universes — such as ASX 200, ETFs, or your own custom watchlist.

📈 Run the ETL Pipeline

You can either:

Use the Global Orchestrator (recommended):

python _3_global_orchestrator.py

Or:

Run each module manually (pricing, financials, fundamentals):

python etl/_1_pricing/pricing_orchestrator.py
python etl/_2_financial_statements/financials_statements_orchestrator.py
python etl/_3_fundamentals/fundamentals_orchestrator.py

📦 Archive Old Data (Optional) After a run, clean up and archive raw data:

python _4_archive_dir.py

📊 What's Included

📁 Historical pricing
📁 Option chains
📁 Technical insights
📁 Financial statements (IS, BS, CF) / (Annual/Quarterly)
📁 Company fundamentals
📁 Static profiles, summaries, and more

Visual Overview

📁 Folder Structure

yahooquery-etl-postgresql-prod/
├── archive/                    # Archived CSVs for version tracking
│   └── data/
├── archive_dir.py              # Archive logic
├── etl/                        # ETL scripts for each data segment
│   ├── _1_pricing/
│   ├── _2_financial_statements/
│   └── _3_fundamentals/                   
├── get_sp500_tickers.py        # Auto-download S&P 500 tickers
├── global_orchestrator.py      # Runs all segments in order
├── output/                     # Fetched raw data
│   ├── _1_pricing/
│   ├── _2_financials/
│   ├── _3_fundamentals/
│   └── merged                  # Merged outputs
├── requirements.txt
├── run_setup.py                # Runs DB creation, schema/tables & folder setup
├── setup/                     
│   ├── create_db.py
│   ├── init_schema_tables.py
│   └── create_dirs.py
├── sql_db_schema/              # CSV schema definition files
│   └── sql_schema.csv
├── utils.py                    # Helper functions + shared paths
├── .env.example                # Template for local credentials
├── .gitignore                  # Excludes sensitive files
└── README.md

Setup

ETL Process

Database Schema

🧪 Project Status & Roadmap

✅ Tested on 200 Tickers.

📌 Upcoming Enhancements:

Automated testing
GitHub Actions for CI/CD
Additional Yahoo data modules

🆔 Project Info

Author: Nicholas Papadimitris
Created On: 09 July 2025, 06:00 AM UTC
Project ID: YF_YQ_ETL_09_Jul2025

🐙 GitHub: @NPStraight2ThePoint
💼 LinkedIn: Nicholas Papadimitris
📧 Email: [email protected]

📄 License

This project is licensed under the MIT License — free to use, modify, and distribute.

🙏 Attribution Requirement

If you distribute or share this repository or its contents publicly, you must:

✅ Provide appropriate credit to the original author.
✅ Include a link to the original repository:
https://github.com/NPStraight2ThePoint/yahooquery-etl-postgresql-prod
✅ Clearly indicate if any changes were made.

You may do so in any reasonable manner, but not in any way that suggests the original author or this repository endorses you or your use.

📢 Third-Party Attributions

This project uses and builds upon the following external sources, which should be credited as per their own licenses:

yahooquery: Python library for Yahoo Finance API, used here for data extraction.
Data sourced from Wikipedia for S&P 500 constituents and related metadata.

Please refer to their respective licenses and terms when redistributing or modifying those components.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Yahoo Finance ETL to PostgreSQL

🚀 Features

⚙️ Setup Instructions

1️⃣ Requirements

2️⃣ Clone the Repo

3️⃣ Set Up Environment

4️⃣ Install Python Dependencies

🧱 Initial Setup (One-time)

🛠️ Get Tickers

📈 Run the ETL Pipeline

Visual Overview

📁 Folder Structure

Setup

ETL Process

Database Schema

🧪 Project Status & Roadmap

🆔 Project Info

📄 License

🙏 Attribution Requirement

📢 Third-Party Attributions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
etl		etl
setup		setup
sql db schema		sql db schema
visuals		visuals
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_1_run_setup.py		_1_run_setup.py
_2_get_sp500_tickers.py		_2_get_sp500_tickers.py
_3_global_orchestrator.py		_3_global_orchestrator.py
_4_archive_dir.py		_4_archive_dir.py
__init__.py		__init__.py
methods.csv		methods.csv
requirements.txt		requirements.txt
utils.py		utils.py
yahooquery_UI_prod_1.txt		yahooquery_UI_prod_1.txt

License

NPStraight2ThePoint/yahooquery-etl-postgresql-prod

Folders and files

Latest commit

History

Repository files navigation

📊 Yahoo Finance ETL to PostgreSQL

🚀 Features

⚙️ Setup Instructions

1️⃣ Requirements

2️⃣ Clone the Repo

3️⃣ Set Up Environment

4️⃣ Install Python Dependencies

🧱 Initial Setup (One-time)

🛠️ Get Tickers

📈 Run the ETL Pipeline

Visual Overview

📁 Folder Structure

Setup

ETL Process

Database Schema

🧪 Project Status & Roadmap

🆔 Project Info

📄 License

🙏 Attribution Requirement

📢 Third-Party Attributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages