Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NPStraight2ThePoint/yahooquery-etl-postgresql-prod

Repository files navigation

πŸ“Š Yahoo Finance ETL to PostgreSQL

Automated, scalable, and modular ETL pipeline using yahooquery to extract pricing, financial statements, and fundamental data β€” all stored in a PostgreSQL database for easy querying and analysis.


πŸš€ Features

  • βœ… Clean, plug-and-play ETL pipeline (3 segments: Pricing, Financials, Fundamentals)
  • πŸ“₯ Automatically scrapes S&P 500 tickers (or lets you configure your own universe)
  • 🧱 Creates and manages PostgreSQL database schema + tables
  • πŸ—ƒοΈ Organized output directories, archiving logic and file handling
  • βš™οΈ Fully modular: update or extend segments easily
  • πŸ”’ Secure .env config (example provided)

βš™οΈ Setup Instructions

1️⃣ Requirements

  • Python 3.9+
  • PostgreSQL (download here)
  • Libraries: see requirements.txt

2️⃣ Clone the Repo

git clone https://github.com/NPStraight2ThePoint/yahooquery-etl-postgresql-prod.git
cd yahooquery-etl-postgresql-prod

3️⃣ Set Up Environment

Create your .env file using the provided template:

cp .env.example .env

Edit .env with your local PostgreSQL credentials:

DB_HOST=localhost
DB_PORT=5432
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=yahooquery_db

4️⃣ Install Python Dependencies

We recommend using a virtual environment:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

🧱 Initial Setup (One-time)

Run:

python _1_run_setup.py

πŸ› οΈ Get Tickers

You have three flexible options for defining your ticker universe:

  1. Run the script to auto-fetch the S&P 500: python _2_get_sp500_tickers.py
  2. Manually replace the default ticker list in output/Static Data/Tickers.csv with your own list of tickers.
  3. Edit the scraping logic inside _2_get_sp500_tickers.py to adapt it to other universes β€” such as ASX 200, ETFs, or your own custom watchlist.

πŸ“ˆ Run the ETL Pipeline

You can either:

  1. Use the Global Orchestrator (recommended):
python _3_global_orchestrator.py

Or:

  1. Run each module manually (pricing, financials, fundamentals):
python etl/_1_pricing/pricing_orchestrator.py
python etl/_2_financial_statements/financials_statements_orchestrator.py
python etl/_3_fundamentals/fundamentals_orchestrator.py

πŸ“¦ Archive Old Data (Optional) After a run, clean up and archive raw data:

python _4_archive_dir.py

πŸ“Š What's Included

πŸ“ Historical pricing
πŸ“ Option chains
πŸ“ Technical insights
πŸ“ Financial statements (IS, BS, CF) / (Annual/Quarterly)
πŸ“ Company fundamentals
πŸ“ Static profiles, summaries, and more

Visual Overview

πŸ“ Folder Structure

yahooquery-etl-postgresql-prod/
β”œβ”€β”€ archive/                    # Archived CSVs for version tracking
β”‚   └── data/
β”œβ”€β”€ archive_dir.py              # Archive logic
β”œβ”€β”€ etl/                        # ETL scripts for each data segment
β”‚   β”œβ”€β”€ _1_pricing/
β”‚   β”œβ”€β”€ _2_financial_statements/
β”‚   └── _3_fundamentals/                   
β”œβ”€β”€ get_sp500_tickers.py        # Auto-download S&P 500 tickers
β”œβ”€β”€ global_orchestrator.py      # Runs all segments in order
β”œβ”€β”€ output/                     # Fetched raw data
β”‚   β”œβ”€β”€ _1_pricing/
β”‚   β”œβ”€β”€ _2_financials/
β”‚   β”œβ”€β”€ _3_fundamentals/
β”‚   └── merged                  # Merged outputs
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ run_setup.py                # Runs DB creation, schema/tables & folder setup
β”œβ”€β”€ setup/                     
β”‚   β”œβ”€β”€ create_db.py
β”‚   β”œβ”€β”€ init_schema_tables.py
β”‚   └── create_dirs.py
β”œβ”€β”€ sql_db_schema/              # CSV schema definition files
β”‚   └── sql_schema.csv
β”œβ”€β”€ utils.py                    # Helper functions + shared paths
β”œβ”€β”€ .env.example                # Template for local credentials
β”œβ”€β”€ .gitignore                  # Excludes sensitive files
└── README.md

Setup

Setup

ETL Process

ETL

Database Schema

Database


πŸ§ͺ Project Status & Roadmap

βœ… Tested on 200 Tickers.

πŸ“Œ Upcoming Enhancements:

  • Automated testing
  • GitHub Actions for CI/CD
  • Additional Yahoo data modules

πŸ†” Project Info

Author: Nicholas Papadimitris
Created On: 09 July 2025, 06:00 AM UTC
Project ID: YF_YQ_ETL_09_Jul2025


πŸ“„ License

This project is licensed under the MIT License β€” free to use, modify, and distribute.


πŸ™ Attribution Requirement

If you distribute or share this repository or its contents publicly, you must:

You may do so in any reasonable manner, but not in any way that suggests the original author or this repository endorses you or your use.


πŸ“’ Third-Party Attributions

This project uses and builds upon the following external sources, which should be credited as per their own licenses:

  • yahooquery: Python library for Yahoo Finance API, used here for data extraction.
  • Data sourced from Wikipedia for S&P 500 constituents and related metadata.

Please refer to their respective licenses and terms when redistributing or modifying those components.

About

Production-ready ETL pipeline for Yahoo Finance data using yahooquery and PostgreSQL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages