This repo is forked from bulik/ldsc to better suit the needs for CELLECT. ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. We have made the following modifications.
Edits
-
ldsc.py: will not compute the 'Annotation Correlation Matrix' to the log file. This can take a long time if you have many annotations.
-
ldsc.py: will not compute the 'correlation matrix including all LD Scores and sample MAF' and condition number. Again, this may take a long time.
-
sumstats.py: modified cell_type_specific() function:
- 'result caching': write a ".cell_type_results.tmp.txt" file after each regression, so we don't loose all computations if ldsc fails during one of the regressions (or the server terminates during the regressions). This is especially important to when running ldsc with many CTS annotations.
- display/log progress of the CTS regressions ("running regression no. ...")
- wrapped 'CTS mode loop' inside try/except for better monitoring of errors.
- added sys.stdout.flush() to enable 'online monitoring' of jobs - even without running in unbuffered mode (python -u).
New scripts
quantile_M_fixed_non_zero_quantiles.pl: modified version ofquantile_M.plthat support h2 calculations for fixed intervals.mtag_munge.py: an improved version ofmunge_sumstats.pycreated by mtag developers. We have made a few small convenient adjustments tomtag_munge.py(see git history) .
Environments
- Added environment_munge.yml with
numpyandpandasversions that works withmunge_sumstats.py(andmtag_munge.py). All of LDSC (includingmunge_sumstats.pyandmtag_munge.py) runs only on python 2.7.