A Python project for loading, visualizing, and analyzing tick-level stock data from IEX using Polars, datashader, and modern visualization libraries. The tool also provides pairs trading analysis capabilities with cointegration testing and statistical analysis.
- High-performance data loading with Polars
- Scalable visualization using datashader for large datasets
- Multiple chart types: time series, volume bars, OHLC candlesticks
- Interactive dashboards with hvplot and bokeh
- Sample data generation for testing
- Flexible CSV input support
- Pairs trading analysis with cointegration testing and statistical analysis
- Multiple aggregation frequencies for pairs analysis (1s, 5s, 1min, 1hr)
- Python 3.8+
- uv package manager
This tool is fully compatible with Windows operating systems. When using Windows:
- Path Separators: The tool automatically handles path separators, so you can use either forward slashes (/) or backslashes (\) in file paths.
- Line Endings: CSV files with either Unix (LF) or Windows (CRLF) line endings are supported.
- File Permissions: Ensure that the user running the tool has read permissions for input files and write permissions for the output directory.
For best results on Windows:
- Use the latest version of Python 3.8 or higher
- Install dependencies using
uvas described in the setup instructions - When specifying file paths, you can use either format:
- Forward slashes:
--file data/ticks.csv - Backslashes:
--file data\ticks.csv(escape backslashes in some shells) - Raw strings:
--file data\ticks.csv(in PowerShell)
- Forward slashes:
-
Clone or navigate to the project directory:
git clone <repository-url> cd IEXScoper
-
Install dependencies using uv:
uv pip install -r requirements.txt
-
Run the visualization tool:
python main.py --file data/your_ticks.csv --output-dir plots
For sample data:
python main.py --sample --output-dir plotsFor Windows users, here are some example commands:
-
Visualization:
python main.py --file data\ticks.csv --output-dir plots
-
Pairs Analysis:
python main.py --pairs-analysis --file1 data\stock1.csv --file2 data\stock2.csv --freq 1min
-
Using PowerShell (note the backtick for line continuation):
python main.py --pairs-analysis ` --file1 data\stock1.csv ` --file2 data\stock2.csv ` --freq 1min
--file, -f: Path to CSV file containing tick data--symbol, -s: Stock symbol for sample data (default: AAPL)--sample: Use generated sample data instead of CSV--points, -p: Number of sample data points (default: 100,000)--output-dir, -o: Output directory for generated plots (default: output)--filter-symbol: Filter CSV data to only show specified symbol--top-symbols: Show only top N symbols by volume (0 = all symbols)
--pairs-analysis: Run pairs analysis instead of visualization--file1: Path to first CSV file for pairs analysis--file2: Path to second CSV file for pairs analysis--freq: Aggregation frequency for pairs analysis (1s, 5s, 1min, 1hr)
The tool specifically supports the IEX TP1 DEEP1.0 format with columns:
Exchange Timestamp: Primary timestamp (preferred)Packet Capture Time: Alternative timestampSend Time: Alternative timestampSymbol: Stock symbolPrice: Tick priceSize: Tick volumeTick Type: Type of tickTrade ID: Trade identifierSale Condition: Sale condition flags
- Price Time Series: Interactive line chart of price movements
- Volume Bars: Bar chart of trading volume over time
- Datashader Scatter: High-performance scatter plot for large datasets
- OHLC Bars: Candlestick-style bars showing open/high/low/close prices
- Dashboard: Combined view of all visualizations
IEXScoper/
βββ main.py # Main application script
βββ pairs_analysis.py # Pairs trading analysis module
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .gitignore # Git ignore patterns
βββ data/ # Directory for CSV files
βββ output/ # Generated visualizations
βββ old/ # Legacy files
- polars: High-performance DataFrame library
- numpy: Numerical computing
- pyarrow: Apache Arrow library for Polars
- iex-parser: IEX data parsing library (from GitHub: https://github.com/hq-4/iex-parser.git)
- datashader: Large dataset visualization
- holoviews: Declarative data visualization
- hvplot: High-level plotting interface
- bokeh: Interactive web-based plotting
- plotly: Interactive plotting library
- panel: Dashboard creation
- scipy: Scientific computing and statistical tests
- statsmodels: Statistical models and tests (cointegration)
- pandas: Data manipulation (for compatibility)
- Polars is used for fast data loading and processing
- Datashader enables visualization of millions of data points
- Memory usage is optimized for large tick datasets
- Interactive plots are saved as HTML files for easy sharing
- Import errors: Ensure all dependencies are installed with
uv pip install -r requirements.txt - Memory issues: Reduce the number of data points or use datashader for large datasets
- Date parsing errors: Check your CSV timestamp format
- Check the console output for detailed error messages
- Ensure your CSV file follows the expected format
- Try running with sample data first to verify the setup
Feel free to submit issues and enhancement requests!
This project is open source. Please check individual dependency licenses for their terms.