Hyperparameter Tuning

Hyperparameter tuning represents one of the most fundamental concepts in machine learning. Yet it’s arguably also one of the most confusing for those unfamiliar with artificial intelligence. Fortunately, it becomes considerably less bewildering with a bit of explanation.

What is a Hyperparameter?

In machine learning, model parameters are the characteristics that define each model and how it learns. Hyperparameters, meanwhile, determine the value of those parameters. They’re essentially top-level parameters used to control the entire machine learning process.

The term ‘hyperparameter’ is also somewhat of a catch-all for any configuration done before the training process.

Engineers and programmers typically choose their hyperparameters before they even begin training their model. A hyperparameter’s value can only be changed by the engineer or engineers directing the machine learning process. The model itself cannot influence its hyperparameters, nor are the hyperparameters part of the final product.

Parameters, meanwhile, are internal to the machine learning model. They’re part of both the algorithm and the final AI system it will ultimately support.

If it helps, you might think of hyperparameters in the same vein as similar to a machine on a factory floor. Although factory workers can do a great deal to tweak and adjust the machine, the items it produces have no influence over how it operates .

Examples of hyperparameters include, but are not limited to:

  • The k-Nearest Neighbors (KNN) Classifier, used for solving regression and classification problems.
  • Train, Test Split Estimator, which establishes the basic parameters for the training data set along with how that data set should be split.
  • Learning Rate, a parameter which typically only applies to neural networks and defines the speed at which the network is able to train.
  • In support vector machines, the C and Σ variables.

The section of an algorithm where hyperparameters are defined is known as the hyperparameter space.

What is Hyperparameter Tuning?

Hyperparameter tuning, also known as hyperparameter optimization, involves finding the ideal hyperparameters for a particular machine learning algorithm. A single machine learning model may require different sequences — or tuples — of hyperparameters depending on the data pattern it’s attempting to solve. Cross validation and hyperparameter tuning are closely-related as well, and the former is frequently used to assess the effectiveness of the latter.

Hyperparameter Tuning Methods

At a high level, you can choose to configure your hyperparameters either manually or automatically.

  • Automated hyperparameter tuning uses an algorithm to determine the ideal hyperparameters for a machine learning model. Generally, this begins with the user defining a range or set of combinations for each hyperparameter and the algorithm running trials to determine which value is the most effective for each. It also requires the user to specify a performance metric by which the tuning algorithm will evaluate each set of hyperparameters.
  • Manual hyperparameters tuning involves manual experimentation and a great deal of trial and error. Upon manually testing each hyperparameter combination, you record the results of the trial until you find a combination that works.

Aside from whether you choose to define your hyperparameters manually or automatically, there are a few different hyperparameter tuning approaches you might leverage:

  • Grid Search: Also known as a parameter sweep, this involves an exhaustive — or brute force — search through a predefined subset of an algorithm’s hyperparameter space.
  • Random Search: A random search selects hyperparameter combinations randomly. It’s significantly less resource-intensive than performing a grid search, and can outperform it when working with a small number of hyperparameters.
  • Bayesian Optimization: Bayesian optimization uses a probability model to select and evaluate hyperparameters. It typically generates better results than both grid search and random search, as it’s capable of assessing the quality of each test before running it.
  • Population-Based Training: This tuning method accounts for both hyperparameter values and neural network weights — parameters within a neural network used to transform input data in various ways. PBT is notable in that it leverages multiple simultaneous learning processes to iteratively replace poorly performing models, allowing it to tune hyperparameters during training rather than between training sessions.