Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Memory leak in git_remote_connect #1673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

arthurschreiber
Copy link
Member

While fixing some memory leaks in Rugged, I think I found a memory leak in libgit2 itself!

The following rugged code does not leak:

repo = Rugged::Repository.new(".")

10000000.times do
  remote = Rugged::Remote.lookup(repo, 'origin')
  remote.connect(:push)
  remote.disconnect
end

This leaks like crazy:

repo = Rugged::Repository.new(".")
remote = Rugged::Remote.lookup(repo, 'origin')

10000000.times do
  remote.connect(:push)
  remote.disconnect
end

#connect and #disconnect simply forward to their libgit2 counterparts, I'm very sure the leak does not come from issues in Rugged itself.

/cc @arrbee

nulltoken and others added 30 commits May 10, 2013 19:22
Fix diff output for renames and copies
Improve ignore handling in git_status_file
This allows us to get a list of reference names in a loop instead of callbacks.
Selecting wether to list loose or packed references is not something
we want to support anymore, so remove a test for this.
Nobody should ever be using anything other than ALL at this level, so
remove the option altogether.

As part of this, git_reference_foreach_glob is now implemented in the
frontend using an iterator. Backends will later regain the ability of
doing the glob filtering in the backend.
If the backend doesn't provide support for it, the matching is done in
refdb on top of a normal iterator.
Fix broken build when MSVC SDL checks is enabled
There was a problem found in the Rugged test suite where the
refdb_fs_backend__next function could exit too early in some
very specific hashing patterns for packed refs.  This ports
the Rugged test to libgit2 and then fixes the bug.
Fix refdb iteration early termination bug
… cache

The code surrounding zlib bundling did not take into consideration
that ZLIB_LIBRARY gets cached, and assumed that FIND(ZLIB) would
always set ZLIB_FOUND, which does not hold true, as this variable
signifies that we have found the package and had to look at the
system, as its location was not cached.

Only use the bundled sources if the external zlib is neither
newly-found nor cached.
This helps us install multiple versions of the library side-by-side.
Create directory for symlink before creating symlink
arrbee and others added 19 commits June 20, 2013 11:39
Files in status will, be default, be sorted according to the case
insensitivity of the filesystem that we're running on.  However,
in some cases, this is not desirable.  Even on case insensitive
file systems, 'git status' at the command line will generally use
a case sensitive sort (like 'ls').  Some GUIs prefer to display a
list of file case insensitively even on case-sensitive platforms.

This adds two new flags: GIT_STATUS_OPT_SORT_CASE_SENSITIVELY
and GIT_STATUS_OPT_SORT_CASE_INSENSITIVELY that will override the
default sort order of the status output and give the user control.
This includes tests for exercising these new options and makes
the examples/status.c program emulate core Git and always use a
case sensitive sort.
Add test for bug fixed in 852ded9
Sorry, I wrote that bug fix and forgot to check in a test at the
same time.  Here is one that fails on the old version of the code
and now works.
Command line status example (with bug fixes)
This fixes the checkout case when a file is modified between the
baseline and the target and yet missing in the working directory.
The logic for that case appears to have been wrong.

This also adds a useful checkout notify callback to the checkout
test helpers that will count notifications and also has a debug
mode to visualize what checkout thinks that it's doing.
Checkout should not recreate deleted files - with fix
This adds the ability for checkout to write to a target directory
instead of having to use the working directory of the repository.
This makes it easier to do exports of repository data and the like.

This is similar to, but not quite the same as, the --prefix option
to `git checkout-index` (this will always be treated as a directory
name, not just as a simple text prefix).

As part of this, the workdir iterator was extended to take the
path to the working directory as a parameter and fallback on the
git_repository_workdir result only if it's not specified.

Fixes libgit2#1332
With the new target directory option to checkout, the non-bareness
of the repository should be checked much later in the parameter
validation process - actually that check was already in place, but
I was doing it redundantly in the checkout APIs.

This removes the now unnecessary early check for bare repos.  It
also adds some other parameter validation and makes it so that
implied parameters can actually be passed as NULL (i.e. if you
pass a git_index, you don't have to pass the git_repository - we
can get it from index).
This adds additonal tests of the checkout target directory option
including using it to dump data from bare repos.
Add target directory to checkout options
WC_ERR_INVALID_CHARS might be already defined by the Windows SDK.

Signed-off-by: Sven Strickroth <[email protected]>
This updates the calls that make the subdirectories for objects
to use a base directory above which git_futils_mkdir won't walk
any higher.  This prevents attempts to mkdir all the way up to
the root of the filesystem.

Also, this moves the objects_dir into the loose backend structure
and removes the separate allocation, plus does some preformatting
of the objects_dir value to guarantee a trailing slash, etc.
Fixed a few header @param and @return typos with the help of -Wdocumentation in Xcode.

The following warnings have not been fixed:
common.h:213 - Not sure how the documentation format is for '...'
notes.h:102 - Correct @param name but empty text
notes.h:111 - Correct @param name but empty text
pack.h:140 - @return missing text
pack.h:148 - @return missing text
In theory, p_stat should never return an S_ISLNK result, but due
to the current implementation on Windows with mount points it is
possible that it will.  For now, work around that by allowing a
link in the path to a directory being created.  If it is really a
problem, then the issue will be caught on the next iteration of
the loop, but typically this will be the right thing to do.
@arthurschreiber
Copy link
Member Author

Here is another example in C - modify use_remote in examples/network/ls-remote.c to this:

static int use_remote(git_repository *repo, char *name)
{
    git_remote *remote = NULL;
    int error, i;

    // Find the remote by name
    error = git_remote_load(&remote, repo, name);
    if (error < 0)
        goto cleanup;

    for (i = 0; !error && i < 100000; i++) {
        // When connecting, the underlying code needs to know wether we
        // want to push or fetch
        error = git_remote_connect(remote, GIT_DIRECTION_FETCH);
        if (error < 0)
            goto cleanup;

        // With git_remote_ls we can retrieve the advertised heads
        error = git_remote_ls(remote, &show_ref__cb, NULL);

        git_remote_disconnect(remote);
    }

cleanup:
    git_remote_free(remote);
    return error;
}

This will make it leak.

@vmg
Copy link
Member

vmg commented Jun 24, 2013

I recall @carlosmn doing this on purpose, but I'm not sure of why. The way I see it, free should always disconnect the remote if it's currently connected.

@arthurschreiber
Copy link
Member Author

Ok, I guess I've been too quick to jump to a conclusion here - git_remote_free will correctly clear all the memory that's used by the remote, but git_remote_disconnect won't free all the memory that was allocated for the transport. So if you keep the remote around for longer time and have multiple calls to git_remote_connect, memory usage will just keep on growing.

Vicent Martí added 3 commits June 24, 2013 11:20
In loose objects backend, constrain mkdir calls to avoid extra mkdirs
@arthurschreiber
Copy link
Member Author

I might be totally off here, but I think the issue is the following:

git_remote_connect creates the transport object, (in my case a local transport) and then in turn calls local_connect. That one calls store_refs, which calls git_vector_init on &t->refs.

Calling git_remote_disconnect does not actually free the vector (because that only happens in git_remote_free). Calling git_remote_connect again will call git_vector_init on the old &t->refs, replacing the existing vector (which now no longer can be freed). And that's when we leak, because only the new vector will be freed on git_remote_free.

If the remote is immediately freed after the first git_remote_connect, everything is fine as the initial t->refs vector gets freed correctly.

@arthurschreiber
Copy link
Member Author

Great, this did not work the way I wanted it to... :/ hub converted the issue into a PR, but opened it against master. I don't see a way to change the target branch, so I'll close this and open it as a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.