Thanks to visit codestin.com
Credit goes to GitHub.com

Skip to content

CodeWizarz/kaira

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kaira

High-performance data ingestion pipeline for Indian index options (NIFTY / BANKNIFTY), designed as a small quant fund data stack:

  • Pluggable providers (NSE official EOD + live option chain collector; vendor adapters are the intended path for full intraday history)
  • Canonical, query-friendly Parquet datasets (columnar, partitioned, compactable)
  • Explicit data-quality checks, quarantine paths, and snapshot logs (for gap detection / “missing ticks”)

Architecture (fund-style)

Bronze → Silver → Gold

  • Bronze (raw): raw provider payloads (JSON/CSV) for audit + reproducibility.
  • Silver (canonical): normalized option_quotes fact table in Parquet (fast scans, stable schema).
  • Gold (research-ready): surfaces, features, resampled bars, strategy-specific datasets (not built yet).

This repo implements Silver plus a snapshot_log dataset; it also writes quarantined payloads + invalid rows to data/quarantine/.

Canonical dataset: option_quotes

Grain: one row per (ts, symbol, expiry, strike, right).

Time semantics

  • ts: timestamp in UTC (ms precision)
  • ts_date: trade date in IST (partition key)
  • ingest_ts: when your collector saw the snapshot (UTC)

Key columns

  • Contract: symbol, expiry, strike, right (C/P)
  • Market: bid, ask, bid_qty, ask_qty, last, iv (stored as decimal; 0.15 == 15%)
  • Activity: oi, volume
  • IDs: instrument_id (stable 64-bit), option_id (debug-friendly string)
  • Provenance: source

Partitioning (Hive)

symbol=.../expiry=YYYY-MM-DD/ts_date=YYYY-MM-DD/part-....parquet

This makes backtests fast because you can prune by symbol/expiry/date at the filesystem level, then rely on Parquet projection + row-group stats for the remaining filters.

Data quality + “missing ticks”

Two mechanisms are used:

  1. Row-level validation (dq_flags bitmask) to catch corruption:

    • missing required values
    • invalid right
    • crossed markets (ask < bid)
    • negative OI/volume
    • NaNs
    • IV out of bounds

    Invalid rows go to data/quarantine/option_quotes_invalid/.

  2. Snapshot log: every polling attempt writes a row to data/silver/snapshot_log/ with status + latency + record count.

    • Your backtester can detect gaps by scanning snapshot_log and deciding how to handle them (skip, forward-fill, resample, etc.).

Data sources for Indian index options

Free / official

  • NSE FO bhavcopy (EOD): reliable for historical OI/volume/close by contract, but no bid/ask and no IV.

Collector (build your own history going forward)

  • NSE option-chain endpoint: good for intraday snapshots, but can be blocked/unstable; treat as best-effort collection, not “institutional history”.

Commercial vendors (recommended for full historical intraday option chain)

You typically need a paid vendor for historical intraday option chain with bid/ask + IV. When evaluating vendors, confirm:

  • true tick vs 1s/1m sampling
  • full depth vs top-of-book
  • corporate action / symbol-change handling
  • exchange timestamp vs vendor timestamp
  • survivorship bias in instrument master
  • replays, corrections, and how they publish late/corrected data

Quickstart

Create an env and install:

py -m venv .venv
.\\.venv\\Scripts\\Activate.ps1
py -m pip install -e .

Collect live option-chain snapshots (creates forward history):

py -m kaira collect nse-live --symbols NIFTY BANKNIFTY --interval-s 2 --duration-s 600

Backfill EOD FO bhavcopy (official, free):

py -m kaira backfill nse-bhavcopy --start 2024-01-01 --end 2024-03-31 --symbols NIFTY BANKNIFTY

Compact partitions (coalesce small files; de-dup by latest ingest_ts):

py -m kaira maint compact-option-quotes --dataset-dir data/silver/option_quotes --min-files 10

Backtest reads (DuckDB)

Use kaira.query.read_option_quotes_arrow() for fast predicate pushdown reads:

from datetime import date
from kaira.query import read_option_quotes_arrow
from kaira.query.duckdb_reader import OptionQuoteQuery

t = read_option_quotes_arrow(
    "data/silver/option_quotes",
    query=OptionQuoteQuery(
        symbol="NIFTY",
        expiries=[date(2026, 2, 5)],
        trade_date_start=date(2026, 2, 1),
        trade_date_end=date(2026, 2, 4),
    ),
    columns=["ts", "expiry", "strike", "right", "bid", "ask", "iv", "oi", "volume"],
)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages