[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

AlexandreAbraham · 2015-10-19T10:01:51Z

This is a naive fix where I use a temporary array to store the nodes to be explored and another one with the nodes to add in the current loop. This can probably be optimized by using a single integer array but the code would be less intuitive and the memory saved would be very small.

I can switch to cython if needed but the current proposition seems optimized enough. I did not add doc since variable names are self explanatory.

I tested it with a degenerated graph using the following code:

size = 2000
a = np.zeros((size, size), dtype=float)

for i in range(size - 1):
    a[i, i + 1] = 1.

_graph_connected_component(a, 0)

Before optimization, memory consumption looks like:

And after:

It is faster and consumes less memory.

ogrisel · 2015-10-19T10:23:42Z

sklearn/manifold/spectral_embedding_.py

-        _, node_to_add = np.where(graph[connected_components_matrix] != 0)
-        connected_components_matrix[node_to_add] = True
-        if last_num_component >= connected_components_matrix.sum():
+        nodes_to_add = np.zeros(shape=(graph.shape[0]), dtype=np.bool)


I am wondering if the following would not be slightly more efficient (probably depends of the malloc used by numpy):

n_node = graph.shape[0] nodes_to_add = np.empty(shape=n_node, dtype=np.uint8) for i in range(n_node): nodes_to_add.fill(0) ...

I had exactly the same thought. My gut feeling is that it's better so I'll change it.

ogrisel · 2015-10-19T10:25:52Z

Other than that the bench looks convincing. If the tests are green on both travis and appveyor, +1 on my side.

@rudimeier can you please test this and tell us if it solve your original problem on your dataset?

GaelVaroquaux · 2015-10-19T13:37:53Z

Can you add an explicit test for graph_connect_component in manifold.tests.test_spectral_embedding.py:test_spectral_embedding_two_components

In this example you know the connect components, so you can test that the function works well.

GaelVaroquaux · 2015-10-19T15:06:20Z

@ogrisel gave his +1 and travis is happy. Merging!

…ected [MRG+1] Optimize sklearn.manifold._graph_is_connected

amueller · 2015-10-21T14:15:39Z

did anyone test the runtime?

Naive optimization

afe1f60

ogrisel reviewed Oct 19, 2015
View reviewed changes

ogrisel changed the title ~~Optimize sklearn.manifold._graph_is_connected~~ [MRG+1] Optimize sklearn.manifold._graph_is_connected Oct 19, 2015

Pre-allocate array in _graph_is_connected

5f8ef64

Add test for _graph_connected_component function

657190d

GaelVaroquaux added a commit that referenced this pull request Oct 19, 2015

Merge pull request #5443 from AlexandreAbraham/optimize_graph_is_conn…

3fd38e9

…ected [MRG+1] Optimize sklearn.manifold._graph_is_connected

GaelVaroquaux merged commit 3fd38e9 into scikit-learn:master Oct 19, 2015

AlexandreAbraham deleted the optimize_graph_is_connected branch October 21, 2015 14:17

giorgiop mentioned this pull request Nov 4, 2015

[MRG+1]: TEST runtime down to 4:30 min on an old laptop #5711

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

Uh oh!

AlexandreAbraham commented Oct 19, 2015

Uh oh!

ogrisel Oct 19, 2015

Uh oh!

AlexandreAbraham Oct 19, 2015

Uh oh!

ogrisel commented Oct 19, 2015

Uh oh!

GaelVaroquaux commented Oct 19, 2015

Uh oh!

GaelVaroquaux commented Oct 19, 2015

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

Uh oh!

Uh oh!

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

[MRG+1] Optimize sklearn.manifold._graph_is_connected #5443

Uh oh!

Conversation

AlexandreAbraham commented Oct 19, 2015

Uh oh!

ogrisel Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

AlexandreAbraham Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Oct 19, 2015

Uh oh!

GaelVaroquaux commented Oct 19, 2015

Uh oh!

GaelVaroquaux commented Oct 19, 2015

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

Uh oh!