Run a `git log` via `git clone` on every repository in a GitHub enterprise

18 minutes to read every commit in 2k+ repos in 400+ orgs which included three instances of the Linux kernel using 120 processes for cloning.

What is this?

This script queries the GraphQL API for GitHub organizations belonging to a specific enterprise and generates a list of all organizations and their associated repositories. Once a list has been generated, the script will then clone each repository and generate a git log of all commits and associated data in csv format.

You can modify the print format:

--pretty=format:'{org},{repo},%H,%ct,%an,%ae,%S,%s'

see git-log for format documentation. NOTE: Commit messages are likely to contain commas, be careful!

Why?

This script was created for a customer with 2000 plus organizations and over 200,000 repositories. The customer wanted to be able to generate a list of all repositories and their associated commits. This script was created to help them generate this data in a timely manner versus using the API to generate the data which would take days to complete.

Requirements and how it works

You need an enterprise owner account with access to the GraphQL API. You can create a personal access token for this account.
You need to know the name of the enterprise you want to query. You can find this in the URL of the enterprise.
Its recommended you do this on a Linux machine with fast disks/IOPS and with ample CPU/Cores. This allows you to take advantage of the multiprocessing capabilities of the script.
This script will clone all repositories and generate a git log for each repository. This can take a long time depending on the size of the enterprise.
Initial data collection via the API is serial with recursive pagination. This feature supports multiprocessing but you will likely hit secondary rate limits scaling it past 2 processes. It's recommended you leave this at 1.
Because this script supports multiprocessing its important you do not overburden the system it's running against. It's recommended you start with 4 clones at a time and scale up from there. You may also want to run this during low usage times if possible.
Once the git log csv file generated, the script will then remove the cloned repository. This is to save disk space.
See requirements.txt for python dependencies (gql,tdqm)

Install

pip install -r requirements.txt

Other notes

Repos using SSH CA cannot be cloned
Locked repos cannot be cloned
This script will not tolerate being IP blocked, it will just fail. Make sure the organization you are querying is not blocking your IP.
Everything is HTTPS, no SSH support
Clones use tokens which can be easily exposed through ps. Make sure you're running this in an isolated environment or something like a container.

Usage

usage: git-repo-logs.py [-h] -e ENTERPRISE [-n [HOST]] [-t [TOKEN]] [-c [CLONING_PROCESSES]]
                        [-a [API_PROCESSES]]

Obtain git logs for all org repos in a single enterprise

required arguments:
  -e ENTERPRISE, --enterprise ENTERPRISE
                        GitHub Enterprise name/slug

options:
  -h, --help            show this help message and exit
  -n [HOST], --host [HOST]
                        GitHub Enterprise hostname (domain.tld)
                        gets/sets GITHUB_HOST env var
  -t [TOKEN], --token [TOKEN]
                        GitHub Enterprise admin PAT token
                        gets/sets GITHUB_TOKEN env var
  -c [CLONING_PROCESSES], --cloning_processes [CLONING_PROCESSES]
                        number of processes to use when cloning
  -a [API_PROCESSES], --api_processes [API_PROCESSES]
                        number of processes to use when traversing repos via the api
                        NOTE: keep this low to avoid rate limiting

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs/imgs		docs/imgs
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
git-repo-logs.py		git-repo-logs.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Run a `git log` via `git clone` on every repository in a GitHub enterprise

What is this?

Why?

Requirements and how it works

Install

Other notes

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

cvega/gh-enterprise-log-everything

Folders and files

Latest commit

History

Repository files navigation

Run a git log via git clone on every repository in a GitHub enterprise

What is this?

Why?

Requirements and how it works

Install

Other notes

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Run a `git log` via `git clone` on every repository in a GitHub enterprise

Packages