Building a machine learning algorithm means training it to make accurate predictions from a set of input data. An algorithm “learns” by adjusting trainable parameters to categorize and label certain features. Machine learning relies on adjusting hyper-parameters, which determine how these adjustments occur.
Whether you’re deploying supervised learning or reinforcement learning, setting these parameters, also known as model tuning, is key to ML development. How does tuning machine learning work, and what can you do to train your models more effectively?
Understanding Hyper-Parameters
Model tuning is the practice of adjusting hyper-parameters to facilitate algorithm “learning,” hence why it’s also called hyper-parameter optimization.
Parameters can be either trainable parameters (the internal values a model learns from the data) or hyper-parameters (external values the ML developer configures). Training involves finding the optimal trainable parameters, while tuning focuses on the hyper-parameters.
The exact hyper-parameters depend on the type of algorithm in use:
- Linear algorithms adjust the value of the parameter alpha.
- Decision trees adjust the number of branches.
- Neural networks adjust the number of layers.
- Regularized regression models look at coefficient penalties.
For instance, deep learning applications use several hyper-parameters to guide the learning process.
- Learning rate is the speed at which the neural network “learns.” If it’s too high, the model fails to converge. If it’s too low, the learning process lasts too long.
- Momentum dictates how quickly the model converges. If this value is high, convergence is faster, but the model risks “overshooting” or getting stuck.
- Dropout is a technique that minimizes the chance of model overfitting.
All these use cases begin with default hyper-parameters as starting points. Data scientists build models by first choosing this default set by running the algorithm with initial datasets. Default hyper-parameters often result in well-performing models, but optimization still provides necessary improvements if the model must function in real-world circumstances.
What Does Hyper-Parameter Tuning Involve?
To tune a model is to find the optimal hyper-parameters. The procedure is trial-and-error in nature:
- Change some hyper-parameters.
- Run the algorithm on the input data.
- Assess the model performance.
- Repeat this process multiple times until you arrive at the most accurate model.
ML developers themselves must modify hyper-parameters during training because the model cannot extrapolate them from the input data. Finding the perfect hyper-parameters accelerates the learning process for one specific application of the algorithm.
The general goal of model tuning is to maximize model performance, resulting in more accurate insights and more impactful business decisions. Because machine learning is heavily dependent on hyper-parameters, model tuning is necessary for every model and dataset.
How Does Model Tuning Work in Practice?
During machine learning development, you must decide on the parameters and their ranges of values. Involving more parameters and values increases the number of combinations with which to experiment. Running the algorithm multiple times to test every combination can be time-consuming, so developers must balance efficiency with comprehensiveness when tuning models.
They may tune a model manually, deciding on the starting parameters based on intuition before testing, or use automated model tuning applications. While manual tuning gives you more control over the process, the latter approach is gaining in popularity thanks to its superior efficiency.
In automated model tuning, an algorithm selects hyper-parameters from a search space and tests each combination out. Some popular algorithms include the following.
Grid Search
Map the values of every hyper-parameter onto a grid and exhaustively test each combination. Also known as parameter sweep, this approach compares the performance of every combination of parameters to find the optimal set.
Grid searches are notoriously strenuous since the model must run multiple times for every combination of hyper-parameters. However, developers can generalize the results across multiple models.
Random Search
This approach is similar to grid search. However, the algorithm selects only a few random combinations of values to test rather than exhaustively testing every set of hyper-parameters. It then finds the top-performing combination.
Random search takes significantly less time than a grid search, especially when there are fewer dimensions for hyper-parameters. In any case where some hyper-parameters are more impactful on the model than others, use random search to optimize the high-priority ones.
Bayesian Search
In this search, the algorithm uses information learned from previous tests to influence future hyper-parameter selections. The Bayesian search arrives at the optimal point faster as a result.
And unlike grid and random search, Bayesian optimization does not perform each experiment in isolation. It improves the sampling of future experiments with the results of the previous ones. Another term for this approach is sequential model-based optimization (SMBO).
Gradient Descent
Hyper-parameters have values that typically form a recognizable gradient, and machine learning algorithms can use this trend to their advantage when selecting ideal parameters.
To illustrate with an analogy, imagine a hiker climbing a mountain. While the hiker doesn’t know the direction to walk to reach the summit, he does know that walking uphill will bring him there.
A gradient-based optimization is then possible by approaching the local minima and maxima of each hyper-parameter and choosing combinations to test from those values.
However, gradient optimization runs into problems when hyper-parameter values don’t map to a “smooth” surface. Some hyper-parameters do not converge to a local minimum or maximum.
Evolutionary Algorithms
Functioning much like natural selection in biology, evolutionary algorithms work by building multiple machine learning models with specific hyper-parameters and testing each one. The algorithm throws out the worst-performing models and produces “offspring” of the best-performing ones; these offspring have similar hyper-parameters as their parents. The ideal models survive by the end of the process.
Evolutionary algorithms are a resource-intensive process, but they are useful in applications where a nearly-optimal model is sufficient.
What Are Some Model Tuning Techniques in Use Today?
Hyper-parameter optimization is available through many sources, including Python libraries like the ones below:
- Scikit-Learn– Scikit-learn is generally an excellent starting point for model tuning, offering functions for random search (randomsearchcv) and grid search (gridsearchcv). However, more efficient options are available elsewhere.
- Hyperopt– This Python library uses Bayesian searches to perform large-scale model tuning. Users can specify the search space, or range of values for parameters.
- Optuna– Similar in function to the previous library, Optuna is notable for its ease of use compared to Hyperopt. Users can choose the length of the optimization process.
- Scikit-Optimize– Another open-source Python library that uses Bayesian optimization is Scikit-Optimize. It’s easy to implement and contains a versatile toolkit for model tuning.
- Ray Tune– This option works well at any scale. Ray Tune uses distributed computing to find the optimal hyper-parameters.
These options deploy a variation of the aforementioned model tuning algorithms and function as practical tools for machine learning developers.