Closed
Description
Dear all,
while running many logistic regressions, I encountered a continuous memory increase on several (Debian) machines. The problem is isolated in this code:
import sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
import time
import psutil
import os
if __name__ == "__main__":
print("Sklearn version: %s" % sklearn.__version__)
n_samples = 2
n_features = 2
data = np.arange(n_samples*n_features).reshape((n_samples,n_features))
labels = np.arange(n_samples)
last_output_time = 0
process = psutil.Process(os.getpid())
for i in range(10000000):
clf = LogisticRegression()
clf.fit(X=data, y=labels)
del clf
if time.time()-last_output_time >= 5:
print(process.get_memory_info()[0] / float(2 ** 20))
last_output_time = time.time()
This was Python 2.7 under Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u1 (2017-02-22) x86_64 GNU/Linux, with scikit-learn 0.18.1. Is this reproducable?