Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 19, 2015
Merged

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

merged 3 commits into from
Oct 19, 2015

Conversation

AlexandreAbraham
Copy link
Contributor

Fix #5024.

This is a naive fix where I use a temporary array to store the nodes to be explored and another one with the nodes to add in the current loop. This can probably be optimized by using a single integer array but the code would be less intuitive and the memory saved would be very small.

I can switch to cython if needed but the current proposition seems optimized enough. I did not add doc since variable names are self explanatory.

I tested it with a degenerated graph using the following code:

size = 2000
a = np.zeros((size, size), dtype=float)

for i in range(size - 1):
    a[i, i + 1] = 1.

_graph_connected_component(a, 0)

Before optimization, memory consumption looks like:
before
And after:
after

It is faster and consumes less memory.

_, node_to_add = np.where(graph[connected_components_matrix] != 0)
connected_components_matrix[node_to_add] = True
if last_num_component >= connected_components_matrix.sum():
nodes_to_add = np.zeros(shape=(graph.shape[0]), dtype=np.bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the following would not be slightly more efficient (probably depends of the malloc used by numpy):

n_node = graph.shape[0]
nodes_to_add = np.empty(shape=n_node, dtype=np.uint8)
for i in range(n_node):
    nodes_to_add.fill(0)
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had exactly the same thought. My gut feeling is that it's better so I'll change it.

@ogrisel
Copy link
Member

ogrisel commented Oct 19, 2015

Other than that the bench looks convincing. If the tests are green on both travis and appveyor, +1 on my side.

@rudimeier can you please test this and tell us if it solve your original problem on your dataset?

@ogrisel ogrisel changed the title Optimize sklearn.manifold._graph_is_connected [MRG+1] Optimize sklearn.manifold._graph_is_connected Oct 19, 2015
@GaelVaroquaux
Copy link
Member

Can you add an explicit test for graph_connect_component in manifold.tests.test_spectral_embedding.py:test_spectral_embedding_two_components

In this example you know the connect components, so you can test that the function works well.

@GaelVaroquaux
Copy link
Member

@ogrisel gave his +1 and travis is happy. Merging!

GaelVaroquaux added a commit that referenced this pull request Oct 19, 2015
…ected

[MRG+1] Optimize sklearn.manifold._graph_is_connected
@GaelVaroquaux GaelVaroquaux merged commit 3fd38e9 into scikit-learn:master Oct 19, 2015
@amueller
Copy link
Member

did anyone test the runtime?

@AlexandreAbraham AlexandreAbraham deleted the optimize_graph_is_connected branch October 21, 2015 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_graph_is_connected consumes too much memory
4 participants