Overview

The mlr3mbo package provides model-based optimization for mlr3.

Costum objective function

(Placeholder)

Using mlr3tuing

Tuning hyperparameters with model-based optimization can be easily integrated into the mlr3tuning framework. If you are not familiar with mlr3tuning, we recommend to read the section in the mlr3book. Our goal in this example is to optimize the cp hyperparameter of rpart on the Pima Indian Diabetes data set.

library(mlr3)

task = tsk("pima")

learner = lrn("classif.rpart")
learner$param_set
#> <ParamSet>
#>                 id    class lower upper      levels        default value
#>  1:       minsplit ParamInt     1   Inf                         20      
#>  2:      minbucket ParamInt     1   Inf             <NoDefault[3]>      
#>  3:             cp ParamDbl     0     1                       0.01      
#>  4:     maxcompete ParamInt     0   Inf                          4      
#>  5:   maxsurrogate ParamInt     0   Inf                          5      
#>  6:       maxdepth ParamInt     1    30                         30      
#>  7:   usesurrogate ParamInt     0     2                          2      
#>  8: surrogatestyle ParamInt     0     1                          0      
#>  9:           xval ParamInt     0   Inf                         10     0
#> 10:     keep_model ParamLgl    NA    NA  TRUE,FALSE          FALSE

We need to specify the search space i.e. the feasible region of cp values.

library(paradox)

search_space = ParamSet$new(list(
  ParamDbl$new("cp", lower = 0.0001, upper = 0.5)
))

Next, we need to define how to evaluate the performance. For this, we need to choose a resampling strategy and a performance measure.

resampling = rsmp("cv", folds = 5)
measure = msr("classif.ce")

We need to select the available budget. For this example, we specify a budget of 30 evaluations. Finally, we put everything together into a TuningInstanceSingleCrit.

library(mlr3tuning)

terminator = trm("evals", n_evals = 20)

instance = TuningInstanceSingleCrit$new(
  task = task, 
  learner = learner, 
  resampling = resampling, 
  measure = measure, 
  terminator = terminator, 
  search_space = search_space)

To start the tuning, we still need to define the optimization algorithm. For model-based optimization we have to choose a surrogate model. We choose the kriging regression learner from the DiceKriging package. Next, we have to select an acquisition function. In this example we use expected improvement. Finally, we need to select an optimizer which optimizes the acquisition function. We choose a simple random search for this task.

library(mlr3mbo)
library(mlr3learners)

set.seed(7823)

surrogate = SurrogateSingleCritLearner$new(learner = lrn("regr.km"))
acq_function = AcqFunctionEI$new(surrogate = surrogate)
acq_optimizer = AcqOptimizerRandomSearch$new()

mlr3mbo offers different loops, which piece together the surrogate model, acquisition model and acquisiton optimizer. We choose the simplest one bayesop_soo. Finally, we construct special tuner class for model-based optimization.

tuner = TunerMbo$new(
  loop_function = bayesop_soo, 
  acq_function = acq_function, 
  acq_optimizer = acq_optimizer)

To trigger the tuning, we pass the tuning instance to the tuner.

tuner$optimize(instance)
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.7146303 
#>   - best initial criterion value(s) :  8.196692 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -8.1967  |proj g|=     0.021092
#> At iterate     1  f =      -8.2117  |proj g|=             0
#> 
#> iterations 1
#> function evaluations 2
#> segments explored during Cauchy searches 1
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 1
#> norm of the final projected gradient 0
#> final function value -8.21174
#> 
#> F = -8.21174
#> final  value -8.211737 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.7621739 
#>   - best initial criterion value(s) :  11.43784 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -11.438  |proj g|=      0.10812
#> At iterate     1  f =      -11.455  |proj g|=       0.10272
#> At iterate     2  f =      -11.461  |proj g|=      0.030806
#> At iterate     3  f =      -11.461  |proj g|=    2.0406e-05
#> At iterate     4  f =      -11.461  |proj g|=    1.4692e-09
#> 
#> iterations 4
#> function evaluations 6
#> segments explored during Cauchy searches 4
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 1.46916e-09
#> final function value -11.4606
#> 
#> F = -11.4606
#> final  value -11.460568 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.7621739 
#>   - best initial criterion value(s) :  14.3358 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -14.336  |proj g|=      0.10559
#> At iterate     1  f =      -14.426  |proj g|=      0.097237
#> At iterate     2  f =      -14.474  |proj g|=       0.67845
#> At iterate     3  f =      -14.475  |proj g|=      0.079207
#> At iterate     4  f =      -14.475  |proj g|=    0.00055511
#> At iterate     5  f =      -14.475  |proj g|=    2.3944e-07
#> 
#> iterations 5
#> function evaluations 7
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 2.39444e-07
#> final function value -14.475
#> 
#> F = -14.475
#> final  value -14.475042 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.896173 
#>   - best initial criterion value(s) :  15.96714 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -15.967  |proj g|=        0.798
#> At iterate     1  f =      -15.973  |proj g|=      0.029832
#> At iterate     2  f =      -15.973  |proj g|=     0.0005886
#> At iterate     3  f =      -15.973  |proj g|=    1.4048e-07
#> 
#> iterations 3
#> function evaluations 6
#> segments explored during Cauchy searches 3
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 1.40475e-07
#> final function value -15.9735
#> 
#> F = -15.9735
#> final  value -15.973468 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  18.64245 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -18.642  |proj g|=     0.096805
#> At iterate     1  f =      -18.714  |proj g|=       0.09113
#> At iterate     2  f =       -18.75  |proj g|=       0.88647
#> At iterate     3  f =      -18.751  |proj g|=      0.079177
#> At iterate     4  f =      -18.751  |proj g|=    0.00050375
#> At iterate     5  f =      -18.751  |proj g|=    1.8827e-07
#> 
#> iterations 5
#> function evaluations 7
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 1.8827e-07
#> final function value -18.7508
#> 
#> F = -18.7508
#> final  value -18.750837 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  21.31422 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -21.314  |proj g|=      0.92133
#> At iterate     1  f =       -21.38  |proj g|=      0.061064
#> At iterate     2  f =      -21.435  |proj g|=       0.05581
#> At iterate     3  f =      -21.435  |proj g|=      0.083091
#> At iterate     4  f =      -21.435  |proj g|=    0.00027452
#> At iterate     5  f =      -21.435  |proj g|=    5.5766e-08
#> 
#> iterations 5
#> function evaluations 8
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 5.57655e-08
#> final function value -21.4354
#> 
#> F = -21.4354
#> final  value -21.435407 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  21.28729 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -21.287  |proj g|=      0.11468
#> At iterate     1  f =      -25.882  |proj g|=       0.92439
#> At iterate     2  f =      -26.162  |proj g|=       0.73615
#> At iterate     3  f =      -26.162  |proj g|=       0.05475
#> At iterate     4  f =      -26.162  |proj g|=    7.9961e-05
#> At iterate     5  f =      -26.162  |proj g|=    8.7404e-09
#> 
#> iterations 5
#> function evaluations 9
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 8.7404e-09
#> final function value -26.1616
#> 
#> F = -26.1616
#> final  value -26.161637 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  28.01298 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -28.013  |proj g|=     0.067256
#> At iterate     1  f =       -30.85  |proj g|=      0.045107
#> At iterate     2  f =      -30.966  |proj g|=      0.041031
#> At iterate     3  f =      -30.966  |proj g|=       0.17558
#> At iterate     4  f =      -30.966  |proj g|=    0.00042505
#> At iterate     5  f =      -30.966  |proj g|=    9.7422e-08
#> 
#> iterations 5
#> function evaluations 10
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 9.74224e-08
#> final function value -30.9661
#> 
#> F = -30.9661
#> final  value -30.966089 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  33.51345 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -33.513  |proj g|=      0.93539
#> At iterate     1  f =      -34.072  |proj g|=      0.041918
#> At iterate     2  f =      -34.072  |proj g|=      0.041836
#> At iterate     3  f =      -34.072  |proj g|=     0.0002232
#> At iterate     4  f =      -34.072  |proj g|=      3.78e-08
#> 
#> iterations 4
#> function evaluations 8
#> segments explored during Cauchy searches 4
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 3.77995e-08
#> final function value -34.0724
#> 
#> F = -34.0724
#> final  value -34.072396 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  33.97168 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -33.972  |proj g|=      0.24364
#> At iterate     1  f =      -33.972  |proj g|=    2.0324e-06
#> At iterate     2  f =      -33.972  |proj g|=    4.0563e-10
#> 
#> iterations 2
#> function evaluations 5
#> segments explored during Cauchy searches 2
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 4.05635e-10
#> final function value -33.9717
#> 
#> F = -33.9717
#> final  value -33.971680 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  27.09511 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -27.095  |proj g|=      0.05371
#> At iterate     1  f =      -37.299  |proj g|=      0.026493
#> At iterate     2  f =      -37.556  |proj g|=       0.02477
#> Nonpositive definiteness in Cholesky factorization in formk;
#>    refresh the lbfgs memory and restart the iteration.
#> At iterate     3  f =      -37.618  |proj g|=      0.023931
#> At iterate     4  f =      -37.635  |proj g|=       0.94562
#> At iterate     5  f =      -37.635  |proj g|=      0.023206
#> At iterate     6  f =      -37.635  |proj g|=    5.3177e-05
#> 
#> iterations 6
#> function evaluations 12
#> segments explored during Cauchy searches 7
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 5.31765e-05
#> final function value -37.6354
#> 
#> F = -37.6354
#> final  value -37.635441 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  33.08346 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -33.083  |proj g|=      0.95884
#> At iterate     1  f =      -38.344  |proj g|=      0.028839
#> At iterate     2  f =      -38.988  |proj g|=       0.94496
#> At iterate     3  f =      -38.991  |proj g|=      0.024193
#> At iterate     4  f =      -38.991  |proj g|=      0.018422
#> At iterate     5  f =      -38.991  |proj g|=     1.754e-05
#> 
#> iterations 5
#> function evaluations 9
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 1.75396e-05
#> final function value -38.9909
#> 
#> F = -38.9909
#> final  value -38.990939 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  22.23223 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -22.232  |proj g|=     0.083882
#> At iterate     1  f =      -29.561  |proj g|=             0
#> 
#> iterations 1
#> function evaluations 2
#> segments explored during Cauchy searches 1
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 1
#> norm of the final projected gradient 0
#> final function value -29.5606
#> 
#> F = -29.5606
#> final  value -29.560556 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  41.77051 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -41.771  |proj g|=     0.036186
#> At iterate     1  f =      -43.987  |proj g|=      0.030712
#> At iterate     2  f =      -45.101  |proj g|=      0.025028
#> At iterate     3  f =      -45.106  |proj g|=       0.94427
#> At iterate     4  f =      -45.106  |proj g|=      0.024633
#> At iterate     5  f =      -45.106  |proj g|=    0.00021493
#> 
#> iterations 5
#> function evaluations 8
#> segments explored during Cauchy searches 5
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 0.000214932
#> final function value -45.1061
#> 
#> F = -45.1061
#> final  value -45.106068 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  26.48197 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -26.482  |proj g|=      0.11366
#> At iterate     1  f =      -33.735  |proj g|=             0
#> 
#> iterations 1
#> function evaluations 2
#> segments explored during Cauchy searches 1
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 1
#> norm of the final projected gradient 0
#> final function value -33.7353
#> 
#> F = -33.7353
#> final  value -33.735276 
#> converged
#> 
#> optimisation start
#> ------------------
#> * estimation method   : MLE 
#> * optimisation method : BFGS 
#> * analytical gradient : used
#> * trend model : ~1
#> * covariance model : 
#>   - type :  matern5_2 
#>   - nugget : NO
#>   - parameters lower bounds :  1e-10 
#>   - parameters upper bounds :  0.9687864 
#>   - best initial criterion value(s) :  50.75256 
#> 
#> N = 1, M = 5 machine precision = 2.22045e-16
#> At X0, 0 variables are exactly at the bounds
#> At iterate     0  f=      -50.753  |proj g|=      0.95583
#> At iterate     1  f =      -54.172  |proj g|=      0.034141
#> At iterate     2  f =      -56.598  |proj g|=      0.027511
#> At iterate     3  f =      -56.936  |proj g|=      0.024623
#> At iterate     4  f =      -56.937  |proj g|=        0.7866
#> At iterate     5  f =      -56.937  |proj g|=     0.0018772
#> At iterate     6  f =      -56.937  |proj g|=    6.5807e-07
#> 
#> iterations 6
#> function evaluations 11
#> segments explored during Cauchy searches 6
#> BFGS updates skipped 0
#> active bounds at final generalized Cauchy point 0
#> norm of the final projected gradient 6.58075e-07
#> final function value -56.9366
#> 
#> F = -56.9366
#> final  value -56.936575 
#> converged
#>            cp learner_param_vals  x_domain classif.ce
#> 1: 0.02046392          <list[2]> <list[1]>  0.2421696

The result is printed in the last line but can be also retrieved with instance$result.

library(ggplot2)

ggplot(instance$archive$data(), aes(cp, classif.ce)) +
  geom_point(aes(colour = batch_nr))

We

library(ggplot2)

ggplot(instance$archive$data(), aes(batch_nr, classif.ce)) +
  geom_line()