2023 Transfer Learning With Kernel Methods

Uploaded by

wokog93129

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views12 pages

2023 Transfer Learning With Kernel Methods

Uploaded by

wokog93129

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Article https://doi.org/10.

1038/s41467-023-41215-8

Transfer Learning with Kernel Methods

Received: 13 December 2022 Adityanarayanan Radhakrishnan1,2,3, Max Ruiz Luyten1,3, Neha Prasad1 &
Caroline Uhler 1,2
Accepted: 28 August 2023

Transfer learning refers to the process of adapting a model trained on a source

Check for updates task to a target task. While kernel methods are conceptually and computa-
tionally simple models that are competitive on a variety of tasks, it has been
unclear how to develop scalable kernel-based transfer learning methods
across general source and target tasks with possibly differing label dimensions.
In this work, we propose a transfer learning framework for kernel methods by
1234567890():,;
1234567890():,;

projecting and translating the source model to the target task. We demon-
strate the effectiveness of our framework in applications to image classiﬁca-
tion and virtual drug screening. For both applications, we identify simple
scaling laws that characterize the performance of transfer-learned kernels as a
function of the number of target examples. We explain this phenomenon in a
simpliﬁed linear setting, where we are able to derive the exact scaling laws.

Transfer learning refers to the machine learning problem of utilizing works on transfer learning with kernels focus on applications in which
knowledge from a source task to improve performance on a target the source and target tasks have the same label sets15–20. Examples
task. Recent approaches to transfer learning have achieved tre- include predicting stock returns for a given sector based on returns
mendous empirical success in many applications, including in com- available for other sectors16 or predicting electricity consumption for
puter vision1,2, natural language processing3–5, and the biomedical certain zones of the United States based on the consumption in other
field6,7. Since transfer learning approaches generally rely on complex zones17. These methods are not applicable to general source and target
deep neural networks, it can be difficult to characterize when and why tasks with differing label dimensions, including classical transfer
they work8. Kernel methods9 are conceptually and computationally learning applications such as using a model trained to classify between
simple machine learning models that have been found to be compe- thousands of objects to subsequently classify new objects. There are
titive with neural networks on a variety of tasks, including image also various works on using kernels for multi-task learning
classification10–12 and drug screening12. Their simplicity stems from the problems21–23, which, in the context of transfer learning, assume that
fact that training a kernel method involves performing linear regres- source and target data are available at the time of training the source
sion after transforming the data. There has been renewed interest in model. These methods can be computationally expensive since they
kernels due to a recently established equivalence between wide neural involve computing matrix-valued kernels, where the number of rows/
networks and kernel methods13,14, which has led to the development of columns is equal to the number of labels. As a consequence, for a
modern, neural tangent kernels (NTKs) that are competitive with kernel trained on ImageNet3224 with 1000 possible labels, a matrix-
neural networks. Given their simplicity and effectiveness, kernel valued kernel would involve 106 times more compute than a classical
methods could provide a powerful approach for transfer learning and kernel method. Prior works also develop kernel-based methods for
also help characterize when transfer learning between a source and learning a re-weighting or transformation that captures similarities
target task would be beneficial. across source and target data distributions25–27. Such a transformation
Yet, developing scalable algorithms for transfer learning with is typically learned by solving an optimization problem that involves
kernel methods for general source and target tasks with possibly dif- materializing the full training kernel matrix, which can be computa-
fering label dimensions has been an open problem. In particular, while tionally prohibitive (e.g., for a dataset with a million samples, this
there is a standard transfer learning approach for neural networks that would require more than 3.5 terabytes of memory).
involves replacing and re-training the last layer of a pre-trained net- In this work, we present a general, scalable framework for per-
work, there is no known corresponding operation for kernels. Prior forming transfer learning with kernel methods. Unlike prior work, our

1
Massachusetts Institute of Technology, Cambridge, MA, USA. 2Broad Institute of MIT and Harvard, Cambridge, MA, USA. 3These authors contributed equally:
Adityanarayanan Radhakrishnan, Max Ruiz Luyten. e-mail: [email protected]