Giftless a Python implementation of a Git LFS Server. It is designed with flexibility in mind, to allow pluggable storage backends, transfer methods and authentication methods.
Giftless supports the basic Git LFS transfer mode with the following storage backends:
- Local storage
- Google Cloud Storage
- Azure Blob Storage with direct-to-cloud or streamed transfers
Additional transfer modes and storage backends could easily be added and configured.
Giftless is available as a Docker image. You can simply use:
$ docker run --rm -p 5000:5000 datopian/giftless
To pull and run Giftless on a system that supports Docker.
This will run the server in WSGI mode, which will require an HTTP server such as nginx to proxy HTTP requests to it.
Alternatively, you can specify the following command line arguments to have uWSGI run in HTTP mode, if no complex HTTP setup is required:
$ docker run --rm -p 8080:8080 datopian/giftless \
-M -T --threads 2 -p 2 --manage-script-name --callable app \
--http 0.0.0.0:8080
If you need to, you can also build the Docker image locally as described below.
You can install Giftless into your Python environment of choice (3.7+) using pip:
(venv) $ pip install giftless
To run it, you most likely are going to need a WSGI server installed such as uWSGI or Gunicorn. Here is an example of how to run Giftless locally with uWSGI:
# Install uWSGI or any other WSGI server
$ (.venv) pip install uwsgi
# Run uWSGI (see uWSGI's manual for help on all arguments)
$ (.venv) uwsgi -M -T --threads 2 -p 2 --manage-script-name \
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080
You can install and run giftless from source:
$ git clone https://github.com/datopian/giftless.git
# Initialize a virtual environment
$ cd giftless
$ python3 -m venv .venv
$ . .venv/bin/activate
$ (.venv) pip install -r requirements.txt
You can then proceed to run Giftless with a WSGI server as described above.
Note that for non-production use you may avoid using a WSGI server and rely on Flask's built in development server. This should never be done in a production environment:
$ (.venv) ./flask-develop.sh
The default generated endpoint is http://127.0.0.1:5000/. Note: If you access this endpoint, you should receive an error message (invalid route).
-
Create a new project on Github or any other platform. Here, we create a project named
example-proj-datahub-io. -
Add any data file to it. The goal is to track this possible large file with git-lfs and use Giftless as the local server. In our example, we create a CSV named
research_data_factors.csv. -
Create a file named
giftless.yamlin your project root directory with the following content in order to have a local server:
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.local_storage:LocalStorage
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write- Export it:
$ export GIFTLESS_CONFIG_FILE=giftless.yaml-
Start the Giftless server (by docker or Python).
-
Initialize your git repo and connect it with the remote project:
git init
git remote add origin YOUR_REMOTE_REPO- Track files with git-lfs:
git lfs track 'research_data_factors.csv'
git lfs track
git add .gitattributes #you should have a .gitattributes file at this point
git add "research_data_factors.csv"
git commit -m "Tracking data files"- You can see a list of tracked files with
git lfs ls-files
- Configure
lfs.urlto point to your local Giftless server instance:
git config -f .lfsconfig lfs.url http://127.0.0.1:5000/<user_or_org>/<repo>/
# in our case, we used http://127.0.0.1:5000/datopian/example-proj-datahub-io/;
# make sure to end your lfs.url with /- The previous configuration will produce changes into
.lfsconfigfile. Add it to git:
git add .lfsconfig
git commit -m "New git-lfs server endpoint"
# if you don't see any changes, run git rm --cached *.csv and then re-add your files, then commit it
git lfs push origin masterIt is also possible to configure Giftless' YAML file to use an external storage.
Modify your giftless.yaml file according to the following config:
$ cat giftless.yaml
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_external:factory
options:
storage_class: ..storage.azure:AzureBlobsStorage
storage_options:
connection_string: GetYourAzureConnectionStringAndPutItHere==
container_name: lfs-storage
path_prefix: large-filesTo use Google Cloud Storage as a backend, you'll first need:
- A Google Cloud Storage bucket to store objects in
- an account key JSON file (see here).
The key must be associated with either a user or a service account, and should have read / write permissions on objects in the bucket.
If you plan to access objects from a browser, your bucket needs to have CORS enabled.
You can deploy the account key JSON file and provide the path to it as
the account_key_file storage option:
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.google_cloud:GoogleCloudStorage
storage_options:
project_name: my-gcp-project
bucket_name: git-lfs
account_key_file: /path/to/credentials.jsonAlternatively, you can base64-encode the contents of the JSON file and provide
it inline as account_key_base64:
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.google_cloud:GoogleCloudStorage
storage_options:
project_name: my-gcp-project
bucket_name: git-lfs
account_key_base64: S0m3B4se64RandomStuff.....ThatI5Redac7edHeReF0rRead4b1lity==After configuring your giftless.yaml file, export it:
$ export GIFTLESS_CONFIG_FILE=giftless.yamlYou will need uWSGI running. Install it with your preferred package manager. Here is an example of how to run it:
# Run uWSGI in HTTP mode on port 8080
$ uwsgi -M -T --threads 2 -p 2 --manage-script-name \
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080See giftless/config.py for some default configuration options.
[WIP] It is possible to use an .env file instead of a YAML file in case you
need to deploy the project in a platform which does not support deploying
configuration in files, such as Heroku.
At this time, we only support a raw format where we dump the content of
giftless.yaml into an env var anmed YAML_CONTENT:
GIFTLESS_CONFIG_STR="TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: ..storage.google_cloud:GoogleCloudStorage
storage_options:
bucket_name: datahub-bbb
api_key: API_KEY
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write
"Note #1: As YAML is a superset of JSON, you can also provide a more compact JSON string instead.
Note #2:: If you provide both a YAML file (as GIFTLESS_CONFIG_FILE) and a
literal YAML string (as GIFTLESS_CONFIG_STR), the two will be merged, with values
from the YAML string taking precedence over values from the YAML file.
Git LFS servers and clients can implement and negotiate different [transfer adapters]
(https://github.com/git-lfs/git-lfs/blob/master/docs/api/basic-transfers.md). Typically,
Git LFS will only define a basic transfer mode and support that. basic is simple
and efficient for direct-to-storage uploads for backends that support uploading using
a single PUT request.
To support more complex, and especially multi-part uploads (uploads done using more
than one HTTP request, each with a different part of a large file) directly to backends
that support that, Giftless adds support for a non-standard multipart-basic transfer
mode. Note that this can only work with specific backends that support this type of
functionality.
You can enable multipart transfers by adding the following lines to your Giftless config file:
TRANSFER_ADAPTERS:
# Add the following lines:
multipart-basic:
factory: giftless.transfer.multipart:factory
options:
storage_class: giftless.storage.azure:AzureBlobsStorage
storage_options:
connection_string: "somesecretconnectionstringhere"
container_name: my-multipart-storageYou must specify a storage_class that supports multipart transfers (implements the MultipartStorage
interface). Currently, these are:
giftless.storage.azure:AzureBlobsStorage- Azure Blob Storage
The following additional options are available for multipart-basic transfer adapter:
action_lifetime- The maximal lifetime in seconds for signed multipart actions; Because multipart uploads tend to be of very large files and can easily take hours to complete, we recommend setting this to a few hours; The default is 6 hours.max_part_size- Maximal length in bytes of a single part upload. The default is 10MB.
See the specific storage adapter for additional backend-specific configuration options to be added under
storage_options.
TBD
TBD
TBD
You can use the ProxyFix Werkzeug middleware to fix issues caused when
Giftless runs behind a reverse proxy, causing generated URLs to not match
the URLs expected by clients:
MIDDLEWARE:
- class: werkzeug.middleware.proxy_fix:ProxyFix
kwargs:
x_host: 1
x_port: 1
x_prefix: 1In order for this to work, you must ensure your reverse proxy (e.g. nginx)
sets the right X-Forwarded-* headers when passing requests.
For example, if you have deployed giftless in an endpoint that is available to
clients at https://example.com/lfs, the following nginx configuration is
expected, in addition to the Giftless configuration set in the MIDDLEWARE
section:
location /lfs/ {
proxy_pass http://giftless.internal.host:5000/;
proxy_set_header X-Forwarded-Prefix /lfs;
}
This example assumes Giftless is available to the reverse proxy at
giftless.internal.host port 5000. In addition, X-Forwarded-Host,
X-Forwarded-Port, X-Forwarded-Proto are automatically set by nginx by
default.
There are a number of CORS WSGI middleware implementations available on PyPI, and you can use any of them to add CORS headers control support to Giftless.
For example, you can enable CORS support using wsgi-cors-middleware:
(.venv) $ pip install wsgi_cors_middlewareAnd then add the following to your config file:
MIDDLEWARE:
- class: wsgi_cors_middleware:CorsMiddleware
kwargs:
origin: https://www.example.com
headers: ['Content-type', 'Accept', 'Authorization']
methods: ['GET', 'POST', 'PUT']giftless is based on Flask, with the following additional libraries:
- Flask Classful for simplifying API endpoint implementation with Flask
- Marshmallow for input / output serialization and validation
- figcan for configuration handling
You must have Python 3.7 and newer set up to run or develop giftless.
We use the following tools and standards to write giftless code:
flake8to check your Python code for PEP8 complianceimportstatements are checked byisortand should be organized accordingly- Type checking is done using
mypy
Maximum line length is set to 120 characters.
You should develop giftless in a virtual environment. We use pip-tools
to manage both development and runtime dependencies.
The following snippet is an example of how to set up your virtual environment for development:
$ python3 -m venv .venv
$ . .venv/bin/activate
(.venv) $ pip install -r dev-requirements.txt
(.venv) $ pip-sync dev-requirements.txt requirements.txt
Once in a virtual environment, you can simply run make test to run all tests
and code style checks:
$ make test
We use pytest for Python unit testsing.
In addition, simple functions can specify some doctest style tests in the
function docstring. These tests will be tested automatically when unit tests
are executed.
Simply run make docker to build a uWSGI wrapped Docker image for giftless.
The image will be named datopian/giftless:latest by default. You can change
it, for example:
$ make docker DOCKER_REPO=mycompany DOCKER_IMAGE_TAG=1.2.3
Will build a Docekr image tagged mycompany/giftless:1.2.3.
Copyright (C) 2020, Viderum, Inc.
Giftless is free / open source software and is distributed under the terms of the MIT license. See LICENSE for details.