Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Memory leak in LogisticRegression #8499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bertholdl opened this issue Mar 3, 2017 · 4 comments · Fixed by #9024
Closed

Memory leak in LogisticRegression #8499

bertholdl opened this issue Mar 3, 2017 · 4 comments · Fixed by #9024
Labels
Milestone

Comments

@bertholdl
Copy link

Dear all,

while running many logistic regressions, I encountered a continuous memory increase on several (Debian) machines. The problem is isolated in this code:

import sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
import time
import psutil
import os

if __name__ == "__main__":
    print("Sklearn version: %s" % sklearn.__version__)
    n_samples = 2
    n_features = 2
    data = np.arange(n_samples*n_features).reshape((n_samples,n_features))
    labels = np.arange(n_samples)
    last_output_time = 0
    process = psutil.Process(os.getpid())
    for i in range(10000000):
        clf = LogisticRegression()
        clf.fit(X=data, y=labels)
        del clf
        if time.time()-last_output_time >= 5:
            print(process.get_memory_info()[0] / float(2 ** 20))
            last_output_time = time.time()

This was Python 2.7 under Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u1 (2017-02-22) x86_64 GNU/Linux, with scikit-learn 0.18.1. Is this reproducable?

@TomDLT
Copy link
Member

TomDLT commented Mar 3, 2017

Adding import gc and gc.collect() after del clf solves the leak, isn't it?

@bertholdl
Copy link
Author

Thank you, indeed it does. However, it seems to matter where the gc.collect() is positioned - if placed right after del the leak vanishes, but if it is run every 5 seconds (after if time.time()...), then the leak persists. Unfortunately, the garbage collection takes such a long time that it is not practical to run it after every estimation since in my case it slows down code considerably (about factor 35 in my "real" case, for this example factor 500). Is this to be expected?

@bertholdl
Copy link
Author

bertholdl commented Mar 3, 2017

P. S.: I tested performance (expressed as number of estimations per second) with this modified code:

import sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
import time
import psutil
import os
import gc

if __name__ == "__main__":
    print("Sklearn version: %s" % sklearn.__version__)
    n_samples = 2
    n_features = 2
    data = np.arange(n_samples*n_features).reshape((n_samples,n_features))
    labels = np.arange(n_samples)
    last_output_time = 0
    process = psutil.Process(os.getpid())
    for i in range(10000000):
        clf = LogisticRegression()
        clf.fit(X=data, y=labels)
        #clf.predict(X=data)
        #clf.predict_proba(X=data)
        del clf
        #gc.collect()
        if time.time()-last_output_time >= 5:
            gc.collect()
            print("%d iterations, %f MB" % (i, process.get_memory_info()[0] / float(2 ** 20)))
            last_output_time = time.time()

@TomDLT
Copy link
Member

TomDLT commented Mar 3, 2017

Ok I didn't realize you were doing so many iterations.
The garbage collection does not fix the memory leak, it just slows down the iterations.

Interestingly, the memory leak seems present only with solver='liblinear', and not with other solvers.

@lesteve lesteve added the Bug label Mar 10, 2017
@lesteve lesteve added this to the 0.19 milestone Mar 22, 2017
superbobry pushed a commit to criteo-forks/scikit-learn that referenced this issue Jun 6, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
jnothman pushed a commit that referenced this issue Jun 7, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes #8499
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
dmohns pushed a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
dmohns pushed a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
NelleV pushed a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
paulha pushed a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this issue Aug 29, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
The leak resulted from two issues:
- not freeing the problem struct
- not freeing the number of iterations

The former was present in the initial version of ``liblinear_helper.c``
while latter appeared after c8c72fd
which introduced ``n_iter``.

Closes scikit-learn#8499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants