Local cache for git repositories to speed up working with large repositories and multiple clones.
The basic idea of gitcache is to use a local bare mirror that is updated when needed and used as the source repository for multiple local repositories.
- Wrapper for the
gitcommand for easy integration. - git-lfs support.
- Custom update interval of git mirrors including the possibility to perform updates only on explicit request.
- Statistics available using the
gitcachecommand. - Timeouts on all mirror-related operations using a total execution timeout and a timeout on the stdout/stderr output of the commands.
- Configuration via environment variables, a global configuration file and a per-configuration configuration file.
gitcache is designed to be used as a wrapper to git, so in the following we show how gitcache translates the git commands for the individual operations.
When the user issues a
git clone https://github.com/seeraven/gitcache.git
for the first time, the repository https://github.com/seeraven/gitcache.git is
cloned into a bare mirror $GITCACHE_DIR/mirrors/github.com/seeraven/gitcache/git
and then the git command is rewritten to
git clone $GITCACHE_DIR/mirrors/github.com/seeraven/gitcache/git gitcache
to create the clone. In addition, the push URL of the clone is adjusted to the upstream URL.
Whenever the user issues another git clone command of that repository, the
mirror is updated (if the update strategy permits it) and the local clone is
created as before.
Whenever the user performs a git pull or git fetch on that local clone,
gitcache checks whether the repository is handled by gitcache (that is the pull
URL is pointing to the mirror, the push URL is pointing to the upstream URL).
If it is, it updates the mirror first (according to the update strategy) and
executes the original command afterwards.
In addition to the git repositories, gitcache supports git-lfs as well and updates of the mirror include updates of the git-lfs part. You can configure gitcache to either use a global git-lfs storage directory or to use per mirror storage directories (the default).
All update operations on a mirror use a lock to ensure that only one modifies the mirror. This is crucial as simultaneous clones would easily lead to inconsistent behaviours and ugly race conditions.
The mirror update strategy is controlled using the so called update interval. It gives the time between two updates of a mirror in seconds and allows you to save network bandwidth by avoiding multiple updates at almost the same time.
In addition, updates from the git pull and git fetch commands can be
completely disabled by setting it to a negative value. This means that updates
of the mirrors are only performed if explicitly requested by a
git update-mirrors command. This can be useful on CI servers to control
network usage even further.
gitcache is distributed as a single executable packaged using pyInstaller.
So all you have to do is to download the latest executable and copy it to a
location of your choice, for example ~/bin:
wget https://github.com/seeraven/gitcache/releases/download/v1.0.28/gitcache_v1.0.28_Ubuntu22.04_x86_64
mv gitcache_v1.0.28_Ubuntu22.04_x86_64 ~/bin/gitcache
chmod +x ~/bin/gitcache
gitcache can be used as a stand-alone command, but it is much easier to use
it as a wrapper to git. All you have to do is to create a symlink and to
adjust the PATH variable so that the wrapper is found before the real
git command:
ln -s gitcache ~/bin/git
export PATH=$HOME/bin:$PATH
The export statement should be added to your ~/.bashrc file to set
it permanently.
Download the latest executable for Windows from the release page
https://github.com/seeraven/gitcache/releases. Rename the executable to
gitcache.exe and put it into a directory in your PATH, e.g., into
C:\Windows. Then create a symlink to git.exe by opening a console and
executing:
cd C:\Windows
mklink git.exe gitcache.exe
Please note that the directory you are putting the symlink into should be stated before the real git command directory in your PATH variable!
A single pyInstaller executable has a huge startup delay on MacOS, therefore
gitcache is distributed as a tar-ball (*.tgz file). Download the archive and
extract it at your desired target location (the archive contains a subfolder):
cd /my/target/destination
tar xfz gitcache_v1.0.28_Darwin_arm64.tgz
xattr -cr gitcache_v1.0.28_Darwin_arm64
ls gitcache_v1.0.28_Darwin_arm64
To use the gitcache command, the final installation directory should be put
into your PATH variable. To use it as a wrapper to the git command, you
have to create the symlink and adjust the PATH variable so that the wrapper
is found bfore the real git command as described on the installation on Linux
section.
You can also install gitcache using a wheel, but please note that this distribution method is not very well tested. If you have problems with gitcache, please test first using one of the pyInstaller executables before opening a bug ticket.
The latest wheel is found on the release page https://github.com/seeraven/gitcache/releases and is installed using pip:
pip install gitcache-1.0.28-py3-none-any.whl
As with the other installation methods, you have to make sure to create a
symlink named git to the location of the gitcache script and ensure it
overlays the real git installation, e.g.:
ln -s $(which gitcache) ~/bin/git
export PATH=$HOME/bin:$PATH
gitcache stores all files under in the directory ~/.gitcache. This base
directory can be changed by setting the GITCACHE_DIR environment variable.
When the GITCACHE_DIR is created, the default configuration file
GITCACHE_DIR/config is created and populated with the default values.
The current configuration can be shown by calling
gitcache
For every item, you'll see a corresponding environment variable that can be used to overwrite the setting of the configuration file.
The configuration options are:
| Category | Config Item | Default Value | Environment Variable |
|---|---|---|---|
| System | realgit | /usr/bin/git |
GITCACHE_REAL_GIT |
| MirrorHandling | updateinterval | 0 s |
GITCACHE_UPDATE_INTERVAL |
| MirrorHandling | cleanupafter | 14 days |
GITCACHE_CLEANUP_AFTER |
| Command | checkinterval | 2 s |
GITCACHE_COMMAND_CHECK_INTERVAL |
| Command | locktimeout | 1 h |
GITCACHE_COMMAND_LOCK_TIMEOUT |
| Command | warniflockedfor | 10 s |
GITCACHE_COMMAND_WARN_IF_LOCKED_FOR |
| GC | commandtimeout | 1 h |
GITCACHE_GC_COMMAND_TIMEOUT |
| GC | outputtimeout | 5 m |
GITCACHE_GC_OUTPUT_TIMEOUT |
| GC | retries | 3 |
GITCACHE_GC_RETRIES |
| LFS | commandtimeout | 1 h |
GITCACHE_LFS_COMMAND_TIMEOUT |
| LFS | outputtimeout | 5 m |
GITCACHE_LFS_OUTPUT_TIMEOUT |
| LFS | permirrorstorage | True |
GITCACHE_LFS_PER_MIRROR_STORAGE |
| LFS | retries | 3 |
GITCACHE_LFS_RETRIES |
| Clone | commandtimeout | 1 h |
GITCACHE_CLONE_COMMAND_TIMEOUT |
| Clone | outputtimeout | 5 m |
GITCACHE_CLONE_OUTPUT_TIMEOUT |
| Clone | retries | 3 |
GITCACHE_CLONE_RETRIES |
| Clone | clonestyle | Full |
GITCACHE_CLONE_STYLE |
| Update | commandtimeout | 1 h |
GITCACHE_UPDATE_COMMAND_TIMEOUT |
| Update | outputtimeout | 5 m |
GITCACHE_UPDATE_OUTPUT_TIMEOUT |
| Update | retries | 3 |
GITCACHE_UPDATE_RETRIES |
| UrlPatterns | includeregex | .* |
GITCACHE_URLPATTERNS_INCLUDE_REGEX |
| UrlPatterns | excluderegex | (empty) | GITCACHE_URLPATTERNS_EXCLUDE_REGEX |
Configuration items that expect a time support the following values:
- Suffix
w,wksorweeksto give the time in weeks. - Suffix
d,dysordaysto give the time in days. - Suffix
h,hrsorhoursto give the time in hours. - Suffix
m,minsorminutesto give the time in minutes. - Suffix
s,secsorsecondsto give the time in seconds. - Numbers can be integer or float, e.g,
1.5 weeks.
The following list gives a description of the configuration options:
-
System/realgit (
GITCACHE_REAL_GIT) specifies the real git command. This is usually/usr/bin/gitbut can be changed as you like. -
MirrorHandling/updateinterval (
GITCACHE_UPDATE_INTERVAL) gives the minimum time between two mirror updates. If this is set to 0, the mirror is updated always when needed. If you set this to something like10 minutesthen the mirror is updated only if the last update was at least 10 minutes ago. -
MirrorHandling/cleanupafter (
GITCACHE_CLEANUP_AFTER) specifies how old mirrors are detected. This is relevant for thegitcache -cresp.git cleanupcommand which removes all old mirrors. The time given here specifies the time since the last update of the mirror. -
To ensure only one command acts on the mirror, a locking mechanism is used that is finetuned by the settings of the Command category. The Command/checkinterval (
GITCACHE_COMMAND_CHECK_INTERVAL) option specifies at what time interval a locked mirror is checked again. The option Command/locktimeout specifies the total timeout after which to give up. Finally, the Command/warniflockedfor gives the time after which the user is warned when the mirror is locked. -
git commands initiated by gitcache that might take a long time are monitored to detect stalled executions. The monitoring is implemented by looking at the stdout/stderr output and the command is assumed to be stalled when there was no output received within a certain time. This timeout is given in the configuration options GC/outputtimeout (
GITCACHE_GC_COMMAND_TIMEOUT), LFS/outputtimeout (outputtimeout), Clone/outputtimeout (GITCACHE_CLONE_OUTPUT_TIMEOUT) and Update/outputtimeout (GITCACHE_UPDATE_OUTPUT_TIMEOUT) for the corresponding git operations garbage collection, lfs file retrieval, clone and update.In addition, a total timeout for each of these groups is given by the options GC/commandtimeout (
GITCACHE_GC_COMMAND_TIMEOUT), LFS/commandtimeout (GITCACHE_LFS_COMMAND_TIMEOUT), Clone/commandtimeout (GITCACHE_CLONE_COMMAND_TIMEOUT) and Update/commandtimeout (GITCACHE_UPDATE_COMMAND_TIMEOUT).If an operation fails, it is retried before finally giving up. This is configured by the GC/retries (
GITCACHE_GC_RETRIES), LFS/retries (GITCACHE_LFS_RETRIES), Clone/retries (GITCACHE_CLONE_RETRIES) and Update/retries (GITCACHE_UPDATE_RETRIES) options. -
Using the Clone/clonestyle (
GITCACHE_CLONE_STYLE) setting you can adjust the method used when cloning a remote repository into the initial bare mirror. The default setting isFullthat uses a normalgit clonecommand. When you are dealing with large repositories and experience problems cloning then, you can switch the method toPartialFirst. This will perform a shallow clone first, followed by agit fetch -unshallow. -
LFS/permirrorstorage (
GITCACHE_LFS_PER_MIRROR_STORAGE) is a boolean flag that determines whether each mirror will have its own lfs storage directory (True) or whether a shared directory is used (False). -
UrlPatterns/includeregex (
GITCACHE_URLPATTERNS_INCLUDE_REGEX) and UrlPatterns/excluderegex (GITCACHE_URLPATTERNS_EXCLUDE_REGEX) are used to identify repositories to mirror. The patterns are checked against the remote URL of a repository and it is only mirrored if the include pattern matches and the exclude pattern does not. If the exclude pattern is empty, it is internally converted into a regex that matches nothing (as an empty string would actually match always which would exclude all URLs).
The gitcache command provides the following options:
-h,--helpto show the command help.-c,--cleanupto remove all outdated mirrors.-u,--update-allto update all mirrors ignoring the update interval.-d MIRROR,--delete MIRRORto delete a mirror identified by its upstream URL or its path in the cache. This option can be specified multiple times.-s,--show-statisticsto show the statistics of gitcache.-z,--zero-statisticsto clear the statistics.
Without any options the gitcache command shows the current configuration.
When called as gitcache git ... it wraps the given git command as described in
the next section.
The following git commands are handled specially. All other commands are forwarded to the real git command.
git cleanupto remove all outdated mirrors.git update-mirrorsto update all mirrors ignoring the update interval.git delete-mirrorto delete a mirror identified by its upstream URL or its path in the cache.git ls-remoteto update the mirror and using it for the remote source of the ls-remote command.git checkoutto perform an lfs fetch for specified refs.git cloneto create or update the mirror and clone from the mirror.git lfs fetchto fetch the lfs handled files for the mirror.git lfs pullto fetch the lfs handled files for the mirror.git pullto update the mirror before updating the clone.git fetchto update the mirror before updating the clone.git submodule initto allow correct initialization of the submodules.git submodule updateto call the gitcache for every submodule.git remote add originfor a delayed initialization of a git repository.
For debugging, set the environment variable GITCACHE_LOGLEVEL to Debug:
GITCACHE_LOGLEVEL=Debug gitcache
The main idea behind gitcache is to perform the caching of the git repositories only for the current user. This means that you should not share the mirrored git repositories with other users, as you do not know if another user would have the permission to access the remote repository.
To start development on gitcache, you have to clone this repository first including all submodules:
git clone https://github.com/seeraven/gitcache.git
cd gitcache
git submodule update --init
Using the make4py framework, we access all major steps using the good old make
command. The main make targets of interest are:
make formatto format the source codemake check-styleto perform a static code analysismake teststo perform unit and functional tests
The actions are executed in a virtual environment per default. You can also use dedicated suffixes on the targets to specify the environment to use:
<target>.venvspecifies explicitly to use a virtual env.<target>.ubuntu24.04specifies to use a Ubuntu 24.04 docker container<target>.alpine3.20specifies to use a Alpine 3.20 docker container<target>.windowsspecifies to use a vagrant machine running Windows<target>.allspecifies to run the target on all variants.
For example, if you want to execute the unit tests on Ubuntu 22.04, you can call
make unittests.ubuntu22.04
Releases are now automatically built if a new tag v<major>.<minor>.<revision>
is pushed to the repository. This changes the release process a little bit:
-
Ensure the upcoming release is fully tested. A look on the commits on github should be enough.
-
Modify the
CHANGELOG.mdfile and insert the new version number. -
Commit the modified
CHANGELOG.mdfile and tag the commit with the new version number. -
As soon as the new tag is pushed to github, the release is built. When it is finished, it is found as a draft on the releases page.
-
Now edit the release draft, insert the changes from the
CHANGELOG.mdfile. Then the release can be saved as a regular release. -
Now prepare the next version. Edit the files
Makefile,pyproject.toml,src/git_cache/git_cache_command.pyanddoc/source/installation.rstand replace the version number:sed -i 's/1.0.28/1.0.29/g' Makefile pyproject.toml src/git_cache/git_cache_command.py doc/source/installation.rst
To create a wheel suitable for the installation via pip, call the following
command:
make build-wheel
You will then find the wheel in the dist directory.
To test the wheel in a separate virtual environment, call
make functional-tests-wheel