labor-leverage

This houses a service, and several related packages, which power the Labor Leverage website.

The service

Labor Leverage is a Go service in cmd/server that fetches data from:

the SEC's EDGAR HTML files
the IRS's 990 XML files

Both data are crunched into a financial Fact data structure, which holds some data common to both reporting formats (e.g. number of employees), but some that are unique (e.g. stock buybacks for SEC data).

The `facts` package

This is where a lot of the data munging happens: raw IRS and/or SEC data are transformed, mostly using elaborate traversal functions with regexes, to try to extract structured or in some cases semi or unstructured data from raw documents. The parsing is quite lossy and there are definitely corporations that will be missing one category of data or another! Yes I considered using an LLM here, but the volume of data seemed big enough that the $$$ didn't seem worth it.

The `ixbrl` package

Part of the implementation of this service requires parsing iXBRL-flavored XHTML documents, which is the publication format used by the SEC's EDGAR system. This package provides utilities for parsing and traversing these documents; check out the godoc for more information.

The `edgar` package

Handles communication with the SEC's Edgar API, which is really just a specific pattern of storing and retrieving HTML files encoded with iXBLR tags.

The `irs` package

Handles communication with the IRS' historical 990 XML filings for non-profits. These are stored in big collated zip files, but this package uses the cloudzip and HTTP range headers to only fetch those parts of the ZIP pertinent to the specific non-profit.

Right now some of this is hardcoded around the tax year 2024, because I'm not sure I've yet followed how or when the IRS makes a full year's returns available and/or when non-profits incrementally report their data.

The `irsform` package

I used some XML schema to Go code generation to produce structs for each of the 990 XML files decoded by the service, but they all required hand-editing, and so that's what's in here.

The `db` package

I'd initially built the service to presume all of the data being backfilled out of band, but it turns out many of the EDGAR documents are large and the total set is big (29GB); the service now lazily caches SEC/IRS data as time goes on in a sqlite database.

Running locally

You'll need the Go toolchain, and Git LFS. Run the server:

go run cmd/server/main.go

Then open http://localhost:8080/ in a browser.

Tests

Many of the tests here are kind of worthless, and only a reflection of trying to suss out specific IRS/SEC data edge cases.

Deployment

I manually deploy this to my server for now. Maybe I'll revisit this if other people want to contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
cmd/server		cmd/server
pkg		pkg
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

labor-leverage

The service

The `facts` package

The `ixbrl` package

The `edgar` package

The `irs` package

The `irsform` package

The `db` package

Running locally

Tests

Deployment

About

Uh oh!

Releases

Packages

Languages

License

saranrapjs/labor-leverage

Folders and files

Latest commit

History

Repository files navigation

labor-leverage

The service

The facts package

The ixbrl package

The edgar package

The irs package

The irsform package

The db package

Running locally

Tests

Deployment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

The `facts` package

The `ixbrl` package

The `edgar` package

The `irs` package

The `irsform` package

The `db` package

Packages