Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views4 pages

Fast Intro To Git Internals

This document provides an overview of Git internals, explaining how Git functions as a database that tracks files (blobs), directories (trees), and commits. It emphasizes the importance of understanding these underlying concepts rather than just using commands, and discusses the roles of refs, the staging area, and the differences between merging and rebasing. Additionally, it includes commands for exploring Git and references to external documentation for further learning.

Uploaded by

Deepak D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views4 pages

Fast Intro To Git Internals

This document provides an overview of Git internals, explaining how Git functions as a database that tracks files (blobs), directories (trees), and commits. It emphasizes the importance of understanding these underlying concepts rather than just using commands, and discusses the roles of refs, the staging area, and the differences between merging and rebasing. Additionally, it includes commands for exploring Git and references to external documentation for further learning.

Uploaded by

Deepak D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

A Fast Intro to Git Internals

Many git tutorials focus on a set of commands and instructions to “get you up to speed” in git,
without addressing the underlying concept of “how git works”. While the commands are
important, I feel it’s more important for you to understand what’s going on behind the scenes.
(Full reference material is included in the appendix at the end of this document. Most diagrams
are taken from Pro Git.)

At a high level, git can be thought of as a database/filesystem/backing store that remembers a


truckload of details about your code base. This information is called the git repository, and
contains three types of content: blobs, trees and commits.

Blobs1 are essentially “files” in git. Each blob is indexed by a SHA1


hash or checksum, so if the same file appears twice in your directory, it
will resolve to the same blob in git. And for the record, git assumes that
there will never be a hash collision. Relax, because it’s true. Seriously.
If you don’t believe that, please read up on it until you do.
Blobs are normally referenced by their hash, although you will rarely
need to type these hashes in.

Git uses trees to store directories. Each tree contains a list of entries,
which are either blobs (files) or other trees (sub-directories). Like blobs,
trees are stored by their hashes. This is a significant detail.... because
a single change to a single file (say in root/sub/myfile.txt) will cause
myfile.txt’s hash to change, which will cause sub’s content list to change,
which will cause root’s content list to change, which will cause root’s hash
to change. Thus, the hash of a tree represents the entire state of every
single file in that tree.

The next object is the commit, which is a snapshot of a tree along with
some additional metadata. As discussed above, the tree reference
represents the entire state of every single file in the tree. The metadata
provides more context for the commit, including the author, comments,
and one or more parents of the commit. Like everything else we’ve seen,
commits are referenced by their hashes.

Git maintains a set of refs, which are human readable names that resolve to specific commits.
For example, the HEAD ref points to the most recent commit in your currently checked out
branch. When you check in another commit, the HEAD ref gets “auto-promoted” to point to your
new commit. Most git operations use the HEAD as their default target. Refs are also used to
identify your own branches, and the current branch is updated to the latest commit each time
1 The diagrams on this page are taken from Pro Git, which is highly recommended for further reading.
you commit, too.

As you work, git maintains three different “views” of your


filesystem. Simultaneously. For some, this is a source of
confusion ;) As you work, a single file (readme.txt) might
be in all three locations, and might be different in each
location.

The git repository is the commit (and corresponding tree)


that you last checked out. Any changes you make to files
on your disk are reflected in the “working directory”. You
can promote these local changes into the staging area
(using “git add”) as often as you like. Then you commit all
of your staged files to the repository.

The staging area is also called the index or the cache. The repository is sometimes called the
tree or the database. Sigh.

When you’re using git, you don’t normally think of blobs or trees2. Instead, user-facing
commands deal with commits and refs, and most of the work you do in git involves traversing
the DAG of commits in your repository, or adding new nodes into that DAG. The DAG always
starts with the “empty” tree, so if you want to think of this in terms of pointers, that would be the
null pointer.

2 If you ever want to look at the blobs or trees in your git repository, you can use “git rev-list --
objects --all” to see a huge list of objects that git is tracking. To see a single object, you can use
“git show <object>”. And if you don’t feel like typing in the whole hash, you can just type the first
couple of characters.
Branching, Rebasing, Merging

m To the left is a diagram of a typical git scenario, where a dev


has created a branch called “mine”, currently pointing at the
C C C same commit as “master”.

m
m The dev does some work and commits C6, C7, and C8.
In the meantime, other users have updated “master” with
C C C C C C C3, C4, and C5, and those commits have been pulled
C C C down into the local master branch. In order to resolve this
situation, the dev can either perform a merge or a
m rebase.

If the dev uses “git merge master” from the “mine”


m branch, the result will look like this. Note that “C9” is
a commit that contains all of the merged source,
C C C C C C which may include “new” code that was introduced
C C C C to resolve merge conflicts.

git checkout m

The alternate approach is to use


m a rebase, which creates a new
commit for each rebased commit.
C C C C C C
Thus, C6’ will contain (mostly)
C C C C C C the same changes that were in
C6, C7’ will match C7, and so on.
git checkout mine; git rebase master
mi
For most operations, a rebase is preferred to a merge:
● It remembers each of your commits.
● Your commits will always show up as the last in the list (“the cream rises to the top”)
● Note that the old commits (C6..C8) are no longer referenced by any refs, so they are
now available for “garbage collection”.

The notable exception is that you should not do a rebase if other repositories have seen
your commits. This might happen if others were basing their repositories on yours, or if you
had pushed your own commits “upstream”.
Final Notes...
Here are some extra commands to get you in trouble help you explore git in all its glory...
● “git rev-list --objects --all” will display all of the objects in your repository.
● “git show <object>” will let you see one of those objects in detail.
● “git fsck --unreachable <sha1>” will show you all the “orphans” that are waiting for
garbage collection.
● “git reflog” will show you a list of everything you’ve done. Ever. It’s a cool tool that
can help you “undo” your recent activity, and find that code you thought you had lost.

External Git documentation


● This thread contains a good overview of the “fourth” object type in git: the ref. (Note
that refs are intentionally “glossed over” in this discussion.)
● A tour of git: the basics: General, easy to read tutorial for getting started
with git.
This one is good for basic "commands to get up and running", but none of that content is in the doc we're
writing, so it's good non-overlapping information.
● git ready: General introduction and "cookbook" reference for git.
http://gitready.com/beginner/2009/02/17/how-git-stores-your-data.html comes pretty close to the content I
want but really doesn't get deep enough into it...
● Git for Computer Scientists: Great explanation of the architecture on which
Git is based.
This. I believe this doc is the one I used to finally understand the key concepts in git.
● Git Magic: Yet another Git tutorial.
I like this one but need to spend more time reading it.
● GitCasts: Screencasts on Git.
For the “video-inclined”.
● Pro Git: Freely available book on Git.
This was looking really good in the 'what is a branch' section but then it goes on to encourage the user to
merge without explaining why rebase is better. If the user bailed early on this doc they'd have some of the
foundation they need but then do the wrong thing (merge) repeatedly.
● Git Reference: Quick reference that links to the Pro Git book.
○ Very command-line oriented (fewer pictures of the tree, more
command-line examples)
● Visual Git Reference: A visual Git Reference, explaining quite a few
commands visually.

You might also like