What is hyperparameter tuning?
Hyperparameters are adjustable parameters that let you control the model training
process. For example, with neural networks, you decide the number of hidden layers
and the number of nodes in each layer. Model performance depends heavily on
hyperparameters.
Hyperparameter tuning, also called hyperparameter optimization, is the process
of finding the configuration of hyperparameters that results in the best performance.
The process is typically computationally expensive and manual.
Azure Machine Learning lets you automate hyperparameter tuning and run
experiments in parallel to efficiently optimize hyperparameters.
Discrete hyperparameters
Discrete hyperparameters are specified as a Choice among discrete values. Choice
can be:
one or more comma-separated values
a range object
any arbitrary list object
Sampling the hyperparameter space
Specify the parameter sampling method to use over the hyperparameter space. Azure
Machine Learning supports the following methods:
Random sampling
Grid sampling
Bayesian sampling
Random sampling
Random sampling supports discrete and continuous hyperparameters. It supports early
termination of low-performance jobs. Some users do an initial search with random
sampling and then refine the search space to improve results.
In random sampling, hyperparameter values are randomly selected from the defined
search space. After creating your command job, you can use the sweep parameter to
define the sampling algorithm.
Sobol
Sobol is a type of random sampling supported by sweep job types. You can use sobol
to reproduce your results using seed and cover the search space distribution more
evenly.
Grid sampling
Grid sampling supports discrete hyperparameters. Use grid sampling if you can
budget to exhaustively search over the search space. Supports early termination of
low-performance jobs.
Bayesian sampling
Bayesian sampling is based on the Bayesian optimization algorithm. It picks samples
based on how previous samples did, so that new samples improve the primary metric.
Bayesian sampling is recommended if you have enough budget to explore the
hyperparameter space. For best results, we recommend a maximum number of jobs
greater than or equal to 20 times the number of hyperparameters being tuned.
The number of concurrent jobs has an impact on the effectiveness of the tuning
process. A smaller number of concurrent jobs may lead to better sampling
convergence, since the smaller degree of parallelism increases the number of jobs that
benefit from previously completed jobs.