Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jnothman
Copy link
Member

Resolved #9393. See #9734.

Ping @ogrisel.

@jnothman jnothman changed the title FIX Avoid accumulating forest predictions in non-threadsafe manner [MRG] FIX Avoid accumulating forest predictions in non-threadsafe manner Sep 26, 2017
@jnothman
Copy link
Member Author

The problem with this is that it makes RF prediction more memory intensive by a factor of n_estimators / n_jobs (assuming n_samples for prediction is much larger than tree size).

I could do this summation over an iterator instead, which with joblib we will only use in the n_jobs=1 case... WDYT?

@jnothman jnothman added this to the 0.19.1 milestone Sep 26, 2017
# ForestClassifier or ForestRegressor, because joblib complains that it cannot
# pickle it when placed there.

def accumulate_prediction(predict, X, out):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that "out" is shared among threads, right?

# ForestClassifier or ForestRegressor, because joblib complains that it cannot
# pickle it when placed there.

def accumulate_prediction(predict, X, out):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that "out" is shared among threads, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case I don't understand the memory increase. Shouldn't out be n_samples x n_classes and the memory increase therefore n_samples x n_classes x n_jobs?

@jnothman
Copy link
Member Author

At master, up to n_jobs prediction matrices are in memory at once. The accumulation is done, albeit badly, in each thread.

At this PR all prediction matrices are returned from parallel, i.e. n_estimators separate matrices. The accumulation is done after.

A mutex lock on the accumulation would be sufficient to solve this, but we've tended to avoid them. Alternatively a map that returns an iterator rather than a list (or a list of promises) would suffice to give roughly current memory consumption with correctness a assured.

@lesteve
Copy link
Member

lesteve commented Sep 28, 2017

Before I forgot I am guessing this is partly reverting #8672. We should take a look at this PR to try to understand the motivations behind it.

@jnothman
Copy link
Member Author

jnothman commented Sep 28, 2017 via email

@lesteve
Copy link
Member

lesteve commented Sep 28, 2017

Chatting with @ogrisel about this one, he thinks adding a threading.Lock is probably the best and the simplest at the same time. The summation of probabilities should not be a bottleneck so that the lock will not impact performance.

Just curious, can we actually reproduce the failure outside the Debian testing framework (mips is the failing architecture I think) ?

@jnothman
Copy link
Member Author

I tried to reproduce it the other day and failed. Our best chance? Lots of random features, max_features=1, very few samples, many estimators, large n_jobs. Let's use threading.Lock then.

@jnothman
Copy link
Member Author

Done

@lesteve
Copy link
Member

lesteve commented Oct 3, 2017

The changes look fine but I am a bit uncomfortable merging this kind of blind, without having to managed to reproduce neither in the scikit-learn tests or in a simpler snippet where you update a single numpy array with parallel summations.

Also maybe it would be a good idea to run a quick and dirty benchmarks to make sure that the lock is not impacting performance?

Alternatively a map that returns an iterator rather than a list (or a list of promises) would suffice to give roughly current memory consumption with correctness a assured.

FYI I tried a little while ago of having Parallel return an generator. It was kind of working except when closing the pool before consuming all the results, in which case it hung and I never figured out why. I may have another go at it at one point. My branch is here if you want to know more details.

@jmschrei
Copy link
Member

jmschrei commented Oct 3, 2017

My understanding of the issue is that the issue arises when multiple trees are trying to add their individual predictions to the single out array at the same time, causing an issue where some updates are overwritten. Is this correct? It seems weird to me that the GIL is not preventing this from happening, do you know why that is the case?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a reasonable fix. GIL contention should not be too visible as adding arrays it probably orders of magnitude faster than computing the predictions themselves.

@ogrisel
Copy link
Member

ogrisel commented Oct 3, 2017

@jmschrei I don't know how the GIL protects the += operation on numpy arrays.

@jnothman
Copy link
Member Author

jnothman commented Oct 3, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Oct 16, 2017

This reproduces the contention, showing that at master, n_jobs=4 is inconsistent and n_jobs=1 is consistent. Unfortunately, it seems to show that the current PR is also inconsistent and I cannot fathom why:

import numpy as np
from sklearn.ensemble import RandomForestRegressor
import sklearn.ensemble
print(sklearn.ensemble.__path__)

X = np.random.rand(10, 100)
y = np.random.rand(10) * 100
rfr = RandomForestRegressor(n_estimators=1000, max_features=1, n_jobs=4).fit(X, y)
ys = []
for i in range(100):
    if i % 10 == 0:
        print(i)
    ys.append(rfr.set_params(n_jobs=4).predict(X))

n_failures = sum(np.any(np.diff(ys, axis=0), axis=1))
if n_failures:
    print('Broke up to %d times!' % n_failures)
else:
    print('Consistent!')

@jnothman
Copy link
Member Author

jnothman commented Oct 16, 2017

The answer is that the test is finding instability due to summation order not threading contention. The differences are minuscule, both at master and this PR. So the issue remains without a reliable test.

What I can say is that the effect of locking on performance is negligible.

@lesteve
Copy link
Member

lesteve commented Oct 16, 2017

Thanks a lot @jnothman, let's merge this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants