Scan insider trades from secform4.com, openinsider.com, SEC EDGAR, and European regulators (FCA, BaFin, AMF, AFM). Includes congressional financial disclosure scanning (House and Senate), multi-source deduplication, committee-based sector filtering, and a desktop GUI with EDGAR filing links plus a European scan workspace.
git clone <repo-url>
cd insider-scanner
pip install -e ".[dev]"Python 3.11+. Dependencies: requests, beautifulsoup4, lxml, pandas, PySide6, pyyaml, pdfplumber, numpy.
insider-scanner
# or
python -m insider_scanner.mainThe GUI tabs cover the core use cases:
- Search a ticker and run both secform4.com and openinsider.com scrapers in one click.
- Fetch the latest trades (configurable count) and run watchlist scans backed by
data/tickers_watchlist.txt. - Toggle sources, specify a date range, trade type, minimum value, or Congress-only filter.
- View sortable tables that display filing/trade dates, highlight congressional filings, show EDGAR links, and let you export CSV/JSON.
- Cancel long-running scans and resolve any ticker to its SEC CIK + filing page.
- Pick a legislator (House/Senate dropdown) or the whole committee list, select sources (House/Senate), and preview results in a threaded worker with progress + cancel.
- Use filters such as trade type, sector, and minimum value, then double-click any row to open the original PDF/PTR.
- Save filtered results to CSV/JSON, with exports reflecting the current filters.
- Choose All/UK/DE/FR/NL, type an ISIN, or scan the European watchlist at
data/eu_watchlist.txt. - Enable optional date bounds, filter by trade type and minimum value, and watch the progress bar while each ISIN is processed.
- Results are sortable, show normalized positions/currency, provide detail text on double-click, and allow opening the regulator source URL.
- Save filtered results to CSV/JSON (filename reflects the ISIN + country) or clear filters to adjust the view.
# Scan a specific ticker
insider-scanner-cli scan AAPL
insider-scanner-cli scan AAPL --type Buy --min-value 1000000 --save
# Scan with date range
insider-scanner-cli scan AAPL --since 2025-01-01 --until 2025-06-30
# Fetch latest insider trades
insider-scanner-cli latest --count 50 --save
insider-scanner-cli latest --since 2025-06-01
# Resolve SEC CIK
insider-scanner-cli cik AAPL
# Initialize default Congress member list
insider-scanner-cli init-congress
# Congress-only filter
insider-scanner-cli scan AAPL --congress-only# Scan a single ISIN or run the built-in watchlist
insider-scanner-cli eu-scan GB0002875804
insider-scanner-cli eu-scan --watchlist --country UK --min-value 50000 --savePass --country to restrict to UK/DE/FR/NL (default: All), --type to filter Buy/Sell trades, --min-value for the total reported value, and --since/--until for date bounds. Use --watchlist to scan every ISIN listed in data/eu_watchlist.txt, and --save to persist the filtered CSV/JSON bundle.
src/insider_scanner/
├── core/
│ ├── models.py # InsiderTrade + CongressTrade dataclasses
│ ├── secform4.py # secform4.com scraper (compound-column parser, direct filing links)
│ ├── openinsider.py # openinsider.com HTML parser + scraper
│ ├── edgar.py # SEC EDGAR CIK resolver (JSON primary + HTML fallback) + filing URLs
│ ├── senate.py # Congress member list + trade flagging
│ ├── congress_house.py # House financial disclosures (ZIP index + PTR PDF parsing)
│ ├── congress_senate.py # Senate EFD scraper (session + search + PTR page parsing)
│ ├── merger.py # Multi-source dedup, filtering, export
│ ├── afm.py # Dutch AFM API client
│ ├── amf.py # French AMF BDIF API client
│ ├── bafin.py # German BaFin download+CSV parser
│ ├── eu_models.py # European trade dataclass + helpers
│ ├── eu_merger.py # European dedup/filter/export helpers
│ ├── eu_scan.py # Dispatcher that runs the selected European scrapers
│ └── rns_investegate.py # UK RNS announcements via Investegate
├── gui/
│ ├── main_window.py # Main window (default OS style + tab management)
│ ├── scan_tab.py # Insider scan workflow: search, filters, table, EDGAR links
│ ├── congress_tab.py # Congress tab with sector filtering and save/export helpers
│ ├── european_tab.py # European tab with ISIN/watchlist scans and detail panel
│ └── widgets.py # Pandas table model, sortable proxy, table helpers
├── utils/
│ ├── config.py # Paths, SEC/User-Agent constants, watchlists
│ ├── logging.py # Logging setup
│ ├── caching.py # File cache with TTL expiry
│ ├── http.py # Rate-limited HTTP helper
│ └── threading.py # Worker/Signal helpers for GUI
├── main.py # GUI entry point
└── cli.py # CLI entry point (US + European commands)
scripts/
└── update_congress.py # Fetch current federal + state legislators
- Resolve:
edgar.pyresolves ticker → CIK via SECcompany_tickers.json(cached 24h, HTML fallback) - Scrape:
secform4.pyfetches CIK-based pages with compound-column parsing (date+type, name+title split by<br>);openinsider.pyfetches ticker-based pages; both produceInsiderTraderecords - Cache: HTTP responses are cached locally with configurable TTL (default 1h)
- Merge:
merger.pydeduplicates trades across sources (matching by ticker + name + date + share count) - Flag:
senate.pychecks insider names against the Congress member list (fuzzy matching) - Verify: secform4 trades include direct SEC filing links; others get generated EDGAR search URLs
- Export: Results saved as CSV + JSON to
outputs/scans/
- Index:
congress_house.pydownloads yearly ZIP archives fromdisclosures-clerk.house.govcontaining XML indexes of all financial disclosure filings. Past years are cached permanently; current year can be refreshed on demand. - Search: XML index is parsed to find PTR (Periodic Transaction Report) filings matching the selected official and date range. Multi-year ranges download multiple indexes as needed.
- Fetch: Individual PTR PDFs are downloaded and cached locally under
data/house_disclosures/{year}/pdfs/. - Parse:
pdfplumberextracts transaction tables from electronically-filed PDFs. Scanned/handwritten PDFs are detected and skipped. - Convert: Raw table rows are converted to
CongressTraderecords with parsed tickers (from asset descriptions), normalized amount ranges, owner codes, and transaction types.
- Session:
congress_senate.pyestablishes an authenticated session withefdsearch.senate.govby accepting the prohibition agreement and obtaining a CSRF token. - Search: POST to the EFD JSON API with senator name, report type (PTR), and date range. Results include links to individual PTR pages. Paper filings (scanned PDFs) are automatically filtered out.
- Parse: Each electronic PTR page contains an HTML table with columns for transaction date, owner, ticker, asset name, type, amount range, and comment. These are parsed via BeautifulSoup.
- Convert: Transactions are converted to
CongressTraderecords. Tickers are read directly from the "Ticker" column when available; when the ticker is "--", it's extracted from the asset description (e.g. "Vanguard ETF (BND)" → BND).
- Search:
rns_investegatequeries Investegate for UK Director/PDMR announcements;bafin,amf, andafmPOST/GET the regulators' portals for Germany, France, and the Netherlands respectively, honoring optional date bounds and ISIN filters. - Parse: Each scraper normalizes the (possibly localized) position text, currency formatting, and trade/filing dates before emitting
EuropeanInsiderTraderecords. - Merge:
eu_scan.scrape_eu_trades_for_isinruns the requested sources,eu_merger.merge_eu_tradesdeduplicates across regulators, andeu_merger.filter_eu_tradesapplies the GUI/CLI filters (country, trade type, min value, date range). - Save: The European tab (and
eu-scan) useeu_merger.save_eu_resultsto output labeled CSV/JSON bundles after the filters are applied.
The Congress Scan tab provides a full GUI workflow for scanning congressional financial disclosures:
- Official selection: searchable dropdown populated from
congress_members.json, with an "All" option - Source checkboxes: independently toggle House and Senate scrapers
- Date range: optional filing date filter
- Filters: trade type (Purchase/Sale/Exchange), minimum dollar amount, and committee-based sector filtering
- Background scanning: threaded execution with progress bar and cancellable stop button
- Results table: sortable columns for filing date, trade date, official, chamber, ticker, asset, type, owner, amount range, and source
- Detail panel: double-click a row to see full details including official's committee sectors
- Open Filing: launches the original disclosure page (House PDF or Senate PTR) in browser
- Save: exports filtered results to CSV + JSON
All EDGAR requests use a proper User-Agent header and are rate-limited to 10 requests/second as required by SEC policy. The User-Agent is configurable via the SEC_USER_AGENT environment variable.
| File | Description |
|---|---|
data/congress_members.json |
Congress member list with committee assignments and sector mappings |
data/tickers_watchlist.txt |
Default ticker symbols |
data/eu_watchlist.txt |
Default ISINs for the European watchlist scan |
data/house_disclosures/ |
Cached House financial disclosure indexes (auto-populated) |
data/eu_watchlist.txt stores one 12-character ISIN per line (comments start with #) and is used by the European tab's watchlist scan and the CLI --watchlist mode.
The Congress member list is populated by scripts/update_congress.py and includes committee assignments and sector mappings derived from the unitedstates/congress-legislators project.
Congress financial disclosures differ from standard insider trades. Instead of exact transaction values, they report dollar ranges (e.g. "$1,001 – $15,000"). The CongressTrade dataclass in models.py handles this with amount_range (original string), amount_low and amount_high (parsed floats), plus fields for owner (Self/Spouse/Dependent Child/Joint), asset_description, and comment.
Each federal legislator is assigned one or more sectors based on their committee assignments. Committees are mapped to sectors via keyword matching (e.g. "Armed Services" → Defense, "Financial Services" → Finance). The available sectors are: Defense, Energy, Finance, Technology, Healthcare, Industrials, and Other. The sector field is a list — for example, a member serving on both Armed Services and Financial Services is tagged as ["Defense", "Finance"]. "Other" is only included when no higher-priority sector applies.
Limitation: Family member financial disclosures (spouses, children) are not publicly machine-readable and would require paid data services. This is a known limitation documented here.
Standalone utility scripts live in scripts/.
Fetches the current list of federal and (optionally) state legislators, enriches them with committee assignments and sector mappings, and writes them to data/congress_members.json.
# Federal only with committee enrichment (no API key needed)
python scripts/update_congress.py
# Federal + state legislators (requires free Open States API key)
OPENSTATES_API_KEY=your_key python scripts/update_congress.py --include-state
# Skip committee enrichment
python scripts/update_congress.py --no-committees
# Preview without saving
python scripts/update_congress.py --dry-run
# Custom output path
python scripts/update_congress.py --output /path/to/members.jsonFederal data and committee assignments come from the unitedstates/congress-legislators project (public domain, community-maintained YAML). State data uses the Open States API (free key required).
# Run all offline tests (default)
pytest -m "not live" -v
# Run only live integration tests (requires internet)
pytest -m live -v
# Run everything
pytest -v
# With coverage
pytest -m "not live" --cov=insider_scanner -vTests are split into two categories:
- Offline (mocked): Use the
responseslibrary to mock HTTP calls. No internet needed. Run by default in CI. - Live integration: Hit real websites. Marked with
@pytest.mark.live. Excluded from CI. Run manually with-m live.
| Module | Tests | Description |
|---|---|---|
test_models.py |
16 | InsiderTrade + CongressTrade dataclasses, amount range parsing |
test_secform4.py |
19 | secform4.com compound-column HTML parser |
test_openinsider.py |
13 | openinsider.com scraper |
test_edgar.py |
14 | CIK resolution (JSON + HTML fallback), EDGAR URL builder |
test_senate.py |
14 | Congress member flagging |
test_merger.py |
19 | Deduplication, filtering, export |
test_caching.py |
10 | File cache with TTL |
test_config.py |
9 | Config paths, watchlist loading |
test_update_congress.py |
34 | Committee enrichment, sector mapping |
test_congress_house.py |
52 | House ZIP index, XML parsing, PDF extraction pipeline |
test_congress_senate.py |
36 | Senate EFD session, search, PTR page parsing |
test_congress_tab.py |
23 | Congress tab functions: filter, sector, save, dataframe |
test_integration.py |
22 | End-to-end pipeline: scrapers → filter → save → reload |
test_eu_models.py |
5 | EuropeanInsiderTrade dataclass, normalize_position |
test_eu_merger.py |
4 | EU deduplication, filtering, dataframe export |
test_eu_sources.py |
24 | AFM/AMF/BaFin/RNS parsing helpers and dispatcher |
test_european.py |
8 | European tab GUI, scan dispatch, filter/save workflow |
test_gui.py |
30 | Widget creation, controls, interactions (requires display) |
test_cli.py |
6 | CLI entry point commands |
test_threading.py |
2 | Worker/Signal threading helpers |
test_main_entrypoint.py |
1 | GUI entry point smoke test |
test_live.py |
6 | Live website tests (deselected in CI) |
GitHub Actions runs on push/PR:
- Test matrix: Python 3.11 + 3.12 + 3.13 + 3.14 on Ubuntu + Windows
- Offline tests only: Live tests excluded via
-m "not live" - GUI tests: Run under
xvfb-runon Linux for headless display; skipped on Windows - Lint:
ruff check . - Format:
ruff format --check . - Coverage: Uploaded as artifact for Python 3.12 Ubuntu
To add a new scraping source:
- Create
src/insider_scanner/core/newsource.pywith ascrape_ticker(ticker) -> list[InsiderTrade]function - Have the parser return
InsiderTraderecords withsource="newsource" - Add it to the merger pipeline in
scan_tab.pyandcli.py - Write mocked tests in
tests/test_newsource.py
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT
Created with Claude AI