-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Fix stopping criterion of _graph_connected_components #5713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stopping criterion of _graph_connected_components #5713
Conversation
Could you check the difference with master in runtime when you test |
@@ -47,17 +47,18 @@ def _graph_connected_component(graph, node_id): | |||
nodes_to_explore = np.zeros(shape=(graph.shape[0]), dtype=np.bool) | |||
nodes_to_explore[node_id] = True | |||
n_node = graph.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be put at the first line and then avoid to read graph.shape
again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Before any optimization:
After first PR
After fixing the stopping criterion
It is slightly slower but it uses a fix amount of memory as opposite to the old version. I have tried to use indices instead of boolean vectors and it is slightly slower (about 70ms) but I can use this option if needed. |
@giorgiop this one should be good to merge. |
Did you try to see if we can completely avoid the allocation of the array |
@@ -42,23 +42,23 @@ def _graph_connected_component(graph, node_id): | |||
belonging to the largest connected components of the given query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connected_components
in Returns
I am running this script to measure gain in runtime:
Here the final code. I am still not sure if we can completely avoid
|
@ogrisel this one to close as well? |
closing as #6268 was merged. |
The function didn't stop in the case of a cyclic graph. I restored the previous stopping criterion and left the optimization that only bring a small overhead on my box (50ms on 10 tries).
Related to #5639