Tuning Machine Learning Models with Hyperopt

Tuning Machine Learning Models with Hyperopt

8 mins read813 Views Comment
clickHere
Updated on Aug 4, 2022 13:57 IST

This article will look at tuning hyperparameters of machine learning models using a library called Hyperopt in Python. We will also look at some traditional hyperparameter optimization methods such as GridSearch and RandomSearch and compare these methods with HyperOpt.

2022_07_hyperopt.jpg

Author: Sayar Banerjee

Bio: Sayar is a data science enthusiast who works as an Analyst at Indihood. His interest areas are Machine Learning, Deep Learning, Blockchain and Cryptocurrencies.

Contents

Traditional Parameters and Hyperparameters

Machine Learning algorithms have two types of parameters:

  1. Traditional Parameters
  2. Hyperparameters

Traditional parameters are fine-tuned as part of the training process. Consequently, these parameters are “learned” while the algorithm attempts to minimize the loss function.

Hyperparameters, on the other hand, are parameters that cannot be estimated or whose value cannot be determined from the underlying data. These parameters are used to direct the training process and thus, play a key role in determining how well a machine learning model trains its traditional parameters. Hyperparameters are usually set before the training process begins.

What is Hyperparameter Tuning?

Hyperparameter tuning or hyperparameter optimization is the process of choosing the optimal set of hyperparameters in a machine learning model. Since we cannot utilize the data to tune these hyperparameters, we have to use some other methods to fine-tune them.

Traditional Methods of Hyperparameter Tuning

In the analytics industry, two trendy methods of hyperparameter tuning are grid search and random search. Let’s understand how these two methods work to find the optimal set of hyperparameters.

Grid Search

Grid search, as the name suggests, divides up the search space for your hyperparameters into a grid or matrix. It then performs an exhaustive search by setting every single combination of the hyperparameters into the algorithm.

Subsequently, we train the model and evaluate the efficacy of the chosen combination of hyperparameters via performance metrics such as RMSE for regression problems and Accuracy for classification problems. 

Most grid search implementations do this evaluation via cross-validation. Finally, the combination of hyperparameters that performs well is the set. And it gets chosen as the optimal hyperparameters.

Example:

To illustrate how grid search works, let us try to build and optimize a machine learning model on a real-world dataset.

I will use the famous scikit-learn library for this demo to build our models. I have chosen a dataset from Kaggle that provides information about the nearest objects in space relative to Earth. We will build a model that predicts whether a given object is hazardous to Earth or not.

Here is the entire code:

 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
import pandas as pd
df = pd.read_csv("./neo.csv")
y = df['hazardous'].values
df.drop(['id', 'name', 'orbiting_body', 'sentry_object', 'hazardous'], axis=1, inplace=True)
X = df.values
rf_classifier = RandomForestClassifier()
grid_param = {
"n_estimators": [100, 200, 300],
"max_depth": [2, 4, 6],
"criterion": ["gini", "entropy"],
}
model = GridSearchCV(
estimator=rf_classifier,
param_grid=grid_param,
scoring="accuracy",
verbose=5,
cv=5,
)
model.fit(X,y)
print(f"Parameters: {model.best_estimator_.get_params()}")
print(f"Best Score: {model.best_score_}")
Copy code

We have chosen the random forest classifier for this demo, as shown in line 11. 

Lines 13 to 17 are where we define our “grid.” The random forest classifier in sklearn has several hyperparameters. However, for the sake of simplicity, we decided to choose just 3 and built our grid.

We then pass this grid onto the GridSearchCV class, which will perform the grid search while the model is trained. Note that we set cv to 5, which means that there will be 5 cross-validation folds of the data.

Finally, we train our model and fetch the optimal hyperparameters, which are as follows:

Parameters: {‘bootstrap’: True, ‘ccp_alpha’: 0.0, ‘class_weight’: None, ‘criterion’: ‘gini’, ‘max_depth’: 6, ‘max_features’: ‘auto’, ‘max_leaf_nodes’: None, ‘max_samples’: None, ‘min_impurity_decrease’: 0.0, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘min_weight_fraction_leaf’: 0.0, ‘n_estimators’: 300, ‘n_jobs’: None, ‘oob_score’: False, ‘random_state’: None, ‘verbose’: 0, ‘warm_start’: False}

Best Score: 0.9125897269418566

Random Search

Unlike Grid Search, Random Search does not perform an exhaustive search over a predefined discrete sample space. Rather, it uses random combinations of hyperparameters to find the optimal set. Historically, most practitioners accept that Random Search yields better results than Grid Search. 

However, due to the random nature, there is a significant degree of chance involved in finding the optimal hyperparameters. One might also see a much higher degree of variability in the performance of the random search as opposed to the grid search.

Example:

Let’s take the same dataset to illustrate the random search method. Here is the entire code:

 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
import pandas as pd
df = pd.read_csv("./neo.csv")
y = df['hazardous'].values
df.drop(['id', 'name', 'orbiting_body', 'sentry_object', 'hazardous'], axis=1, inplace=True)
X = df.values
rf_classifier = RandomForestClassifier()
dist_param = {
"n_estimators": range(100, 501, 50),
"max_depth": range(2, 11, 1),
"criterion": ["gini", "entropy"],
}
model = RandomizedSearchCV(
estimator=rf_classifier,
param_distributions=dist_param,
scoring="accuracy",
verbose=5,
cv=5,
n_iter=20,
)
model.fit(X,y)
print(f"Parameters: {model.best_estimator_.get_params()}")
print(f"Best Score: {model.best_score_}")
Copy code

You will notice that a lot of the code is very similar to the grid search. However, there are some key differences. 

Firstly, we have to define a distribution for the random search algorithm to search over rather than a deterministic set of values. This is why for our search space, we define ranges for each hyperparameter. (Look at lines 13 to 17)

Secondly, we also have to specify the number of iterations (n_iter) for our algorithm. 

The rest of the steps are very similar. We use a cv of 5 again and fit our model. Finally, we print out the best combination of hyperparameters.

Parameters: {‘bootstrap’: True, ‘ccp_alpha’: 0.0, ‘class_weight’: None, ‘criterion’: ‘entropy’, ‘max_depth’: 10, ‘max_features’: ‘auto’, ‘max_leaf_nodes’: None, ‘max_samples’: None, ‘min_impurity_decrease’: 0.0, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘min_weight_fraction_leaf’: 0.0, ‘n_estimators’: 450, ‘n_jobs’: None, ‘oob_score’: False, ‘random_state’: None, ‘verbose’: 0, ‘warm_start’: False}

Best Score: 0.9130521049908868

Note that our randomized search method worked slightly better as compared to the grid search.

The above two types of hyperparameter optimization techniques are somewhat shallow types of optimizations. They do not use any “intelligent” ways of searching the domain space. Hence, they are a lot easier to implement and are often very efficient in arriving at the optimal set.

However, sometimes, more sophisticated techniques are required to arrive at a sufficiently optimal set of hyperparameters. This is why many players in data-intensive fields such as analytics, deep learning, VR, etc., are moving towards more advanced optimization methods. 

One such library that facilitates this in Python is hyperopt.

Hyperopt

Hyperopt is an open-source off-the-shelf library in Python that intelligently performs optimizations. More specifically, optimization algorithms implemented in hyperopt utilize a type of Bayesian optimization to arrive at the optimal solution.

Bayesian Optimization

Our two shallow approaches, grid search, and randomized search do not incorporate past results during the search process. However, bayesian search optimization methods do use past information to update and improve the future performance of the algorithm, thereby intelligently searching the domain space. Thus, this method can also be thought of as sequential.

One such popular approach called the sequential model-based optimization or SMBO approach creates a cheaper version of the objective function and optimizes the hyperparameters on this “cheaper version” (also called the surrogate or approximation function). This method proves to be computationally cheaper and efficient. 

After the evaluation is done on the surrogate function, the observed result is stored and the current set of hyperparameters is applied to the true objective function. These results are then used to predict the next suggested set of hyperparameters.

This process continues until the algorithm is terminated.

SMBO forms the crux for the underlying algorithms that are implemented in hyperopt.

Example:

We will now see an example of using hyperopt on a scikit-learn model to find the optimal hyperparameters. Again, we will stick to using the NASA objects dataset. The code is as follows:

 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from hyperopt import hp, fmin, tpe, Trials, STATUS_OK
import pandas as pd
df = pd.read_csv("./neo.csv")
y = df['hazardous'].values
df.drop(['id', 'name', 'orbiting_body', 'sentry_object', 'hazardous'], axis=1, inplace=True)
X = df.values
rf_classifier = RandomForestClassifier()
def objective(params):
model = RandomForestClassifier(
n_estimators=params['n_estimators'],
criterion=params['criterion'],
max_depth=params['max_depth'],
)
accuracy = cross_val_score(
model,
X,
y,
cv=5,
).mean()
return {'loss': -accuracy, 'status': STATUS_OK}
params = {
'n_estimators': hp.choice('n_estimators', [100, 200, 300, 400, 500]),
'criterion': hp.choice('criterion', ['gini', 'entropy']),
'max_depth': hp.quniform('max_depth', 2, 10, 1)
}
trials = Trials()
opt = fmin(
fn=objective,
space=params,
algo=tpe.suggest,
max_evals=20,
trials=trials
)
opt
Copy code

On line 14, we define an objective function that hyperopt will attempt to minimize. Note here that we compute the mean accuracy score over the k-fold cross-validations. We then return the negative value of this accuracy. The reason for doing this is that hyperopt attempts to minimize the objective function. Thus, by minimizing the negative accuracy, we actually improve the predictive power of our model.

After defining our objective function, we create our search space grid in a similar manner to previous examples of shallow optimization methods. Hyperopt provides us with the hp module which helps us define the search distribution of each hyperparameter.

Finally, we pass our objective function and parameter grid to the fmin module. Notice that we set tpe.suggest as the algorithm for minimizing our objective function. TPE or tree-structured parzen estimator is a more advanced variation of the SMBO method we discussed earlier.

We also set our number of evaluations to 50 after which the algorithm terminates and gives us the optimal set of hyperparameters.

Here are the results:

100%|██████████| 20/20 [39:11<00:00, 117.59s/it, best loss: -0.912931008113312]

{‘criterion’: 1, ‘max_depth’: 10.0, ‘n_estimators’: 3}

Thus, we have seen how to use hyperopt to intelligently optimize our hyperparameters. Note that though hyperopt is more efficient for complex problems, for a lot of simpler cases, it may be wiser to stick to using traditional methods of hyperparameter optimization.

Hyperopt-sklearn: A simpler alternative to hyperopt for machine learning

Hyperopt-sklearn is a library created by hyperopt that can leverage everything that hyperopt offers at a higher level.

Hyperopt-sklearn acts as a wrapper around scikit-learn which provides data scientists a much simpler user interface for using hyperopt.

Let’s revisit the previous example. But this time, we shall use hyperopt-sklearn instead of hyperopt. Here is the code:

 
from hpsklearn import HyperoptEstimator, random_forest_classifier
import pandas as pd
df = pd.read_csv("./neo.csv")
y = df['hazardous'].values
df.drop(['id', 'name', 'orbiting_body', 'sentry_object', 'hazardous'], axis=1, inplace=True)
X = df.values
classifier = HyperoptEstimator(classifier=random_forest_classifier("rf_classifier"))
classifier.fit(X, y)
print(classifier.best_model())
print(classifier.score(X, y))
Copy code

Here are the results:


{‘learner’: RandomForestClassifier(criterion=’entropy’, max_features=0.2934870643494858,

                       min_samples_leaf=2, n_estimators=95, n_jobs=1,

                       random_state=4, verbose=False), ‘preprocs’: (StandardScaler(),), ‘ex_preprocs’: ()}

0.9803271830551764

It looks like our accuracy has improved a lot as compared to previous methods.

Furthermore, the amount of code required to train an optimized model is minimal. These types of libraries are becoming increasingly popular in the field of automatic machine-learning or AutoML.

As the name suggests, AutoML automates entire processes in the machine learning cycle, including hyperparameter optimization.

As a result, many tasks previously considered time-consuming by data scientists are becoming trivial, thereby facilitating higher productivity and efficiency.

Other advantages of using hyperopt:

Hyperopt also allows parallel processing of tuning processes. The library has a native client called SparkTrials which helps parallelize the tuning process over data stored in Apache Spark.

This helps improve the scalability of hyperparameter optimization, especially when dealing with high-velocity and high-volume data.

In conclusion, I hope you enjoyed reading this article about hyperparameter tuning, specifically about how hyperopt has helped data science practitioners with their hyperparameter tuning tasks. 

Until next time!

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Comments