Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[WIP] Add sparse-rbf kernel option for semi_supervised.LabelSpreading #15922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

nik-sm
Copy link
Contributor

@nik-sm nik-sm commented Dec 18, 2019

Reference Issues/PRs

This pull request adds another named kernel option, sparse-rbf for semi_supervised.LabelSpreading.

Loosely related to #15868.

Rationale

This kernel provides RBF weights for only the k-nearest-neighbors of each point.

Currently, the only kernel options are knn and rbf - but for a large dataset, the dense rbf kernel option is not feasible (for N items, we must compute a dense NxN matrix). For any non-trivial dataset, computing the dense kernel matrix will be infeasible, so my purpose here is to give a kernel that can perform better than knn, and still be feasible for a large dataset.

The intuition is that a weighted adjacency matrix gives more information about the graph structure of our dataset, and therefore with appropriate parameter tuning, such a kernel should perform better than a binary adjacency matrix. Furthermore, filling with the RBF weights is cheap, once we have found the k-nearest-neighbors, so the additional runtime cost is minimal.

In particular, I believe that the performance difference will be more clear for datasets with more "difficult" structure; therefore I included CIFAR10 as an option for the example script (the easiest way I know to obtain this is via https://pytorch.org/docs/stable/torchvision/datasets.html#cifar), though I have not yet had time to experiment with a significant fraction of that dataset.

It might also be useful to try this on some toy datasets like https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html, but I have not experimented with this yet.

Notes and Questions

I included tests to show:

  1. this kernel "works" (the learned classifier has high accuracy)
  2. the weights found using sparse-rbf are indeed the same as the top-k entries of the kernel that you get using the dense rbf option

Since I am making a claim/intuition about performance, I also included an example that does a hyperparameter grid search for the two available sparse kernels (knn, already provided, and the new sparse-rbf), and then uses the optimal parameters across a range of % supervision to compare:

  1. transductive accuracy (guessing labels for the unlabeled training data)
  2. inductive accuracy (labeling test data)
  3. runtime

I'm opening this PR to get some feedback on the following items:

  1. Does this kernel seem helpful, and could I make it more helpful somehow?
  2. Is the example I provided clear enough, and would other examples be useful to show the potential benefit of this kernel?
  3. For sklearn examples in general, I'm not sure (practically speaking) how to get the results to be visible as embedded plots, so is there something I should do to structure my example appropriately?
  4. Are there other experiments/examples that would be useful for evaluating this kernel that folks would like to see?

Caveats about the hyperparameter search

  • The search is done using only the inductive accuracy because of the API used by GridSearchCV. It might be possible (with some headache) to run the grid search using transductive accuracy, but this also might not matter.
  • The search is done at a fixed % supervision, again because of the API for GridSearchCV.
  • Notice that the exact hyperparameters found depend on what dataset and what fraction of that dataset is used for the grid search.

Example results

Here are the results of using 20% of MNIST, with the fixed parameters included at the bottom of the example script.:
sparse_kernel_comparison

Please let me know if you need more info.
Thanks!

np.exp(W.data, out=W.data)
# explicitly set diagonal,
# since np.exp(W.data) does not modify zeros on the diagonal
W.setdiag(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this line causes a warning about changing the sparsity structure.

I think the 2 best options here are:

  1. Suppress warning (if starting with lil matrix and converting to csc is actually slower overall)
  2. Start with lil matrix and convert to csc after.

My guess is that the correct thing to do is suppress the warning, but I left it for now so others could weigh in on this.

Base automatically changed from master to main January 22, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant