5 - Tuning Hyperparameters

Analytical Paleobiology Workshop 2022

Hyperparameters

Some model or preprocessing parameters cannot be estimated directly from your data

Choose the best parameter

mod_glm3 <- h2o.glm(x = predictors, y= target, 
                   training_frame=ring_train,
                   family="gaussian", lambda = 0, 
                   compute_p_values = TRUE,
                   nfolds=10, keep_cross_validation_predictions=TRUE,
                   seed = 1234)

How do we know that 0️⃣ is a good value?

Choose the best parameter

The main two strategies for optimization are:

Grid search 💠 which tests a pre-defined set of candidate values
Random search 🌀 which tests random sets of candidate values

Choose the best parameter: Grid search


# glm parameters

glm_params1 <- list(lambda=c(0, 0.5, 1),
                    alpha=c(0, 0.5, 1))

Choose the best parameter: Grid search

# Train and validate a cartesian grid of GLMs
glm_grid1 <- h2o.grid("glm", x = predictors, y = target,
                      grid_id = "glm_grid1",
                      training_frame = ring_train,
                      seed = 1,
                      hyper_params = glm_params1)

# Get the grid results, sorted by validation RMSE
glm_gridperf1 <- h2o.getGrid(grid_id = "glm_grid1",
                             sort_by = "rmse",
                             decreasing = TRUE)

print(glm_gridperf1)
#> H2O Grid Details
#> ================
#> 
#> Grid ID: glm_grid1 
#> Used hyper parameters: 
#>   -  alpha 
#>   -  lambda 
#> Number of models: 36 
#> Number of failed models: 0 
#> 
#> Hyper-Parameter Search Summary: ordered by decreasing rmse
#>   alpha lambda          model_ids    rmse
#> 1   1.0    1.0 glm_grid1_model_18 2.69280
#> 2   1.0    1.0 glm_grid1_model_27 2.69280
#> 3   1.0    1.0 glm_grid1_model_36 2.69280
#> 4   1.0    1.0  glm_grid1_model_9 2.69280
#> 5   0.5    1.0 glm_grid1_model_17 2.62457
#> 
#> ---
#>    alpha lambda          model_ids    rmse
#> 31   0.5    0.0 glm_grid1_model_20 2.18858
#> 32   1.0    0.0 glm_grid1_model_21 2.18858
#> 33   0.0    0.0 glm_grid1_model_28 2.18858
#> 34   0.5    0.0 glm_grid1_model_29 2.18858
#> 35   1.0    0.0  glm_grid1_model_3 2.18858
#> 36   1.0    0.0 glm_grid1_model_30 2.18858

Choose the best parameter: Grid search

# Grab the top GLM model, chosen by validation RMSE
best_glm1 <- h2o.getModel(glm_gridperf1@model_ids[[1]])

# Now let's evaluate the model performance on a test set
# so we get an honest estimate of top model performance
best_glm_perf1 <- h2o.performance(model = best_glm1,
                                  newdata = ring_test)
h2o.rmse(best_glm_perf1)
#> [1] 2.701137

# Look at the hyperparameters for the best model
print(best_glm1@model[["model_summary"]])
#> GLM Model: summary
#>     family     link        regularization number_of_predictors_total
#> 1 gaussian identity Lasso (lambda = 1.0 )                         10
#>   number_of_active_predictors number_of_iterations  training_frame
#> 1                           1                    1 RTMP_sid_9593_3

Choose the best parameter: Random search


# Use same data as previous example


# # GLM hyperparameters (bigger grid than above)
glm_params2 <- list(lambda=seq(0,1, 0.1),
                    alpha=seq(0,1, 0.05))

search_criteria <- list(strategy = "RandomDiscrete", max_models = 10, seed = 1)

Choose the best parameter: Grid search

# Train and validate a cartesian grid of GLMs
glm_grid2 <- h2o.grid("glm", x = predictors, y = target,
                      grid_id = "glm_grid2",
                      training_frame = ring_train,
                      seed = 1,
                      hyper_params = glm_params2,
                       search_criteria = search_criteria)

# Get the grid results, sorted by validation RMSE
glm_gridperf2 <- h2o.getGrid(grid_id = "glm_grid2",
                             sort_by = "rmse",
                             decreasing = TRUE)

print(glm_gridperf2)
#> H2O Grid Details
#> ================
#> 
#> Grid ID: glm_grid2 
#> Used hyper parameters: 
#>   -  alpha 
#>   -  lambda 
#> Number of models: 40 
#> Number of failed models: 0 
#> 
#> Hyper-Parameter Search Summary: ordered by decreasing rmse
#>   alpha lambda          model_ids    rmse
#> 1   0.7    0.9 glm_grid2_model_19 2.62786
#> 2   0.7    0.9 glm_grid2_model_29 2.62786
#> 3   0.7    0.9 glm_grid2_model_39 2.62786
#> 4   0.7    0.9  glm_grid2_model_9 2.62786
#> 5   0.9    0.6  glm_grid2_model_1 2.56506
#> 
#> ---
#>    alpha lambda          model_ids    rmse
#> 35  0.45    0.1 glm_grid2_model_38 2.30685
#> 36  0.45    0.1  glm_grid2_model_8 2.30685
#> 37  0.95    0.0 glm_grid2_model_16 2.18858
#> 38  0.95    0.0 glm_grid2_model_26 2.18858
#> 39  0.95    0.0 glm_grid2_model_36 2.18858
#> 40  0.95    0.0  glm_grid2_model_6 2.18858

Choose the best parameter: Grid search

# Grab the top GLM model, chosen by validation RMSE
best_glm2 <- h2o.getModel(glm_gridperf2@model_ids[[1]])

# Now let's evaluate the model performance on a test set
# so we get an honest estimate of top model performance
best_glm_perf2 <- h2o.performance(model = best_glm2,
                                  newdata = ring_test)
h2o.rmse(best_glm_perf2)
#> [1] 2.645909

# Look at the hyperparameters for the best model
print(best_glm2@model[["model_summary"]])
#> GLM Model: summary
#>     family     link                           regularization
#> 1 gaussian identity Elastic Net (alpha = 0.7, lambda = 0.9 )
#>   number_of_predictors_total number_of_active_predictors number_of_iterations
#> 1                         10                           4                    1
#>    training_frame
#> 1 RTMP_sid_9593_3

Your turn

Use either a grid or random search to train your model

How do this vary with the values you initially chose?

15:00

Optimize tuning parameters

Try different values and measure their performance
Find good values for these parameters
Finalize the model by fitting the model with these parameters to the entire training set

Number of trees in a Random Forest?

Yes ✅

Number of PCA components to retain?

Yes ✅

Bayesian priors for model parameters?

Hmmmm, probably not ❌

Is the random seed a tuning parameter?

Nope ❌