Thanks to visit codestin.com
Credit goes to github.com

Skip to content

space efficient storage for a million EDG binaries #200

@milahu

Description

@milahu

the "million EDG binaries" (30MB zipped, 140MB raw) would compress well with git

transfer size would stay the same, but storage size would be much smaller = no need for amazon S3 server

migrate tarballs to git:

#!/bin/sh

if [ -d gitrepo ]; then
  echo "error: folder exists: gitrepo. to run test again, run: rm -rf gitrepo"
  exit 1
fi

mkdir gitrepo

git -C gitrepo init

# https://github.com/rose-compiler/rose/blob/weekly/src/frontend/CxxFrontend/EDG_VERSION
release_list="$(cat <<EOF
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.77.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.78.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.79.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.80.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.81.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.2
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.3
EOF
)"

for release in $release_list
do
  echo adding $release
  [ -e $release.tar.gz ] || wget http://edg-binaries.rosecompiler.org/$release.tar.gz
  [ -d $release ] || tar -xf $release.tar.gz
  cp -r $release/* $release/.libs gitrepo/

  # TODO use release date for commit + tag
  git -C gitrepo add .
  git -C gitrepo commit -m "$release"
  git -C gitrepo tag "$release"

  rm -rf $release
done

echo raw size
du -sh gitrepo/.git
echo
echo compressing ...
time git -C gitrepo gc
echo
echo compressed size
du -sh gitrepo/.git
echo
echo total size of tarballs
du -shc roseBinaryEDG-*.tar.gz | tail -n1
raw size
247M	gitrepo/.git

compressing ...
Enumerating objects: 35, done.
Counting objects: 100% (35/35), done.
Delta compression using up to 4 threads
Compressing objects: 100% (34/34), done.
Writing objects: 100% (35/35), done.
Total 35 (delta 15), reused 0 (delta 0), pack-reused 0

real	0m57.203s
user	0m52.688s
sys	0m3.180s

compressed size
34M	gitrepo/.git

total size of tarballs
213M	total

fetching a tarball would be as simple as

wget https://github.com/rose-compiler/edg-binaries/archive/roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.3.tar.gz

compression can be optimized by

compiling object code with the -ffunction-sections and -fdata-sections compiler flags. This has the effect that if you 'insert' a function into a translation unit, the insertion does not cause all of the addresses to change across the whole object file.

https://github.com/elfshaker/elfshaker#applicability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions