Developer Information for Optimization Methods in Tensor Toolbox

This page contains information for developers who want to use the optimization method interfaces contained in Tensor Toolbox. These interfaces are meant to standardize across packages with very different ways of calling the functions, different parameter names, etc. For optimzation algorithm installation instructions and parameter options, see Documentation for Tensor Toolbox Optimization Methods.

The above-referenced document is for general users. The information below is for developers of Tensor Toolbox methods that want to use these standardized optimization algorithms.

Contents

Test optimization problem

We use a simple but notoriously difficult test function to demonstrate the optimization methods, a variation on Rosenbrock:

$$f(x) = (x_1 - 1)^2 + 4 \sum_{i=2}^n (x_i - x_{i-1}^2)^2$$

This function is implemented in optfunc which takes true as a second argument to return only the gradient. The number of parameters (n) is automatically determined by the method. The optimum is the all-ones vector. The code for the function is as follows:

function [f,g] = optfunc(x,gradonly)
%OPTFUNC Example function for testing optimization software.

n = length(x);
g = zeros(n,1);

f   = .25*(x(1)-1.0)^2;
for i = 2:n
    f = f + ( x(i) - (x(i-1))^2 )^2;
end
f = 4*f;

t1      = x(2) - (x(1)^2);
g(1)    = 2.0*( x(1) - 1.0 ) - 16*x(1)*t1;
for i = 2:(n-1)
    t2 = t1;
    t1 = x(i+1) - (x(i))^2;
    g(i) = 8*t2 - 16*x(i)*t1;
end
g(n)  = 8*t1;

if exist('gradonly','var')
    if gradonly == true
        f = g;
    end
end

We create function handles to use the optimization solvers.

% Create a function handle to evaluate both f and g
fgh = @optfunc;
% Create a function handle to evaluate just f
fh = @optfunc;
% Create a function handle to evaluate just g
gh = @(x) optfunc(x,true);

Set up initial guess and optimum

n = 10;
xinit = [2; ones(n-1,1)];
xopt = ones(n,1);

% Initial function value
finit = fh(xinit);
fprintf('Function value at xinit = %g\n', finit);

% Optimal function value
fopt = fh(xopt);
fprintf('Function value at xopt = %g\n', fopt);
Function value at xinit = 37
Function value at xopt = 0

Sandia Poblano Toolbox L-BFGS

POBLANO Version 1.1 by Evrim Acar, Daniel Dunlavy, and Tamara Kolda

[x,f,info] = tt_opt_lbfgs(xinit, fgh);
Poblano L-BFGS Optimization
Number of Variables: 10
Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, maxsubiters=10000

Begin Main Loop
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
     0         1      37.00000000      10.08959860
     1         8       0.91828104       0.92158834
     2        10       0.16611190       0.32713416
     3        12       0.03552274       0.15994623
     4        14       0.00563181       0.05761539
     5        16       0.00135701       0.02431580
     6        18       0.00039065       0.01439439
     7        20       0.00007552       0.00602386
     8        22       0.00002250       0.00320035
     9        24       0.00000020       0.00041099
    10        26       0.00000004       0.00015232
    11        28       0.00000001       0.00004888
    12        30       0.00000000       0.00001906
    13        32       0.00000000       0.00000505
End Main Loop

Final f: 4.2823e-11
Setup time: 0.0037 seconds
Optimization time: 0.034 seconds
Iterations: 13
Total iterations: 32
Exit condition: Successful termination based on StopTol

Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.255973e-04

Tighten it up a bit by changing the tolerances. ALso note that we changed the printing frequency to once every 5 iterations rather than every iteration as above.

[x,f,info] = tt_opt_lbfgs(xinit, fgh, 'printitn', 5, 'gtol', 1e-20, 'ftol', 1e-25);
Poblano L-BFGS Optimization
Number of Variables: 10
Parameters: m=5, ftol=1e-25, gtol=1e-20, maxiters = 1000, maxsubiters=10000

Begin Main Loop
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
     0         1      37.00000000      10.08959860
     5        16       0.00135701       0.02431580
    10        26       0.00000004       0.00015232
    15        36       0.00000000       0.00000057
    20        46       0.00000000       0.00000000
    25        57       0.00000000       0.00000000
    30        72       0.00000000       0.00000036
    35        82       0.00000000       0.00000001
    40        92       0.00000000       0.00000000
    45       102       0.00000000       0.00000000
    50       112       0.00000000       0.00000000
    55       123       0.00000000       0.00000000
    59       131       0.00000000       0.00000000
End Main Loop

Final f: 3.6076e-27
Setup time: 0.0099 seconds
Optimization time: 0.067 seconds
Iterations: 59
Total iterations: 131
Exit condition: Relative change in F < RelFuncTol

With tighter tolerances, we get closer to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 6.559331e-12

Stephen Becker L-BFGS-B

L-BFGS-B by Stephen Becker.

[x,f,info] = tt_opt_lbfgsb(xinit, fgh);
L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C)
Number of Variables: 10
Lower bound: -Inf, Upper bound: Inf
Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10

Begin Main Loop
Iter     1, f(x) = 1.263359e+00, ||grad||_infty = 1.20e+01, 1.15e-03
Iter     2, f(x) = 2.814196e-01, ||grad||_infty = 4.85e+00, 1.27e-03
Iter     3, f(x) = 6.099619e-02, ||grad||_infty = 1.11e+00, 1.59e-03
Iter     4, f(x) = 2.574744e-02, ||grad||_infty = 7.85e-01, 1.83e-03
Iter     5, f(x) = 5.050682e-03, ||grad||_infty = 4.27e-01, 2.04e-03
Iter     6, f(x) = 1.585758e-03, ||grad||_infty = 2.84e-01, 2.31e-03
Iter     7, f(x) = 1.701942e-04, ||grad||_infty = 6.15e-02, 2.54e-03
Iter     8, f(x) = 5.353674e-05, ||grad||_infty = 1.69e-02, 2.76e-03
Iter     9, f(x) = 2.318400e-05, ||grad||_infty = 1.39e-02, 2.97e-03
Iter    10, f(x) = 2.569928e-06, ||grad||_infty = 9.27e-03, 3.18e-03
Iter    11, f(x) = 3.586768e-07, ||grad||_infty = 3.46e-03, 3.71e-03
Iter    12, f(x) = 2.598817e-08, ||grad||_infty = 7.97e-04, 3.84e-03
Iter    13, f(x) = 2.466644e-09, ||grad||_infty = 2.50e-04, 4.04e-03
Iter    14, f(x) = 5.126655e-10, ||grad||_infty = 7.74e-05, 4.23e-03
Iter    15, f(x) = 8.969330e-11, ||grad||_infty = 7.19e-05, 4.41e-03
Iter    16, f(x) = 9.642045e-12, ||grad||_infty = 1.23e-05, 4.61e-03
End Main Loop

Final f: 9.6420e-12
Setup time: 0.0014 seconds
Optimization time: 0.005 seconds
Iterations: 16
Total iterations: 35
Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.

Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.224764e-04

Tighten it up a bit as before

[x,f,info] = tt_opt_lbfgsb(xinit, fgh, 'printitn', 5, 'gtol', 1e-12, 'ftol', 1e-24);
L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C)
Number of Variables: 10
Lower bound: -Inf, Upper bound: Inf
Parameters: m=5, ftol=1e-24, gtol=1e-12, maxiters = 1000, subiters = 10

Begin Main Loop
Iter     5, f(x) = 5.050682e-03, ||grad||_infty = 4.27e-01, 1.41e-03
Iter    10, f(x) = 2.569928e-06, ||grad||_infty = 9.27e-03, 1.98e-03
Iter    15, f(x) = 8.969330e-11, ||grad||_infty = 7.19e-05, 2.75e-03
Iter    20, f(x) = 1.343287e-12, ||grad||_infty = 9.30e-07, 3.35e-03
Iter    25, f(x) = 1.306474e-12, ||grad||_infty = 3.38e-09, 4.05e-03
Iter    30, f(x) = 1.306458e-12, ||grad||_infty = 1.11e-08, 4.65e-03
Iter    35, f(x) = 1.304805e-12, ||grad||_infty = 1.56e-07, 5.31e-03
Iter    40, f(x) = 1.037402e-12, ||grad||_infty = 1.33e-06, 5.93e-03
Iter    45, f(x) = 6.552398e-16, ||grad||_infty = 1.06e-07, 6.58e-03
Iter    50, f(x) = 1.720279e-19, ||grad||_infty = 2.17e-09, 7.09e-03
Iter    55, f(x) = 2.298904e-21, ||grad||_infty = 1.61e-11, 7.61e-03
Iter    60, f(x) = 1.714203e-21, ||grad||_infty = 7.02e-11, 8.28e-03
Iter    65, f(x) = 7.255157e-24, ||grad||_infty = 1.64e-11, 9.30e-03
Iter    67, f(x) = 2.032944e-26, ||grad||_infty = 4.58e-13, 9.63e-03
End Main Loop

Final f: 2.0329e-26
Setup time: 0.0013 seconds
Optimization time: 0.01 seconds
Iterations: 67
Total iterations: 140
Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.

With tigher tolerances, we again get closer to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 3.479939e-14

MATLAB Optimization Toolbox fminunc (Quasi-Newton Method)

Distributed with the MATLAB Optimization Toolbox.

[x,f,info] = tt_opt_fminunc(xinit, fgh);
Quasi-Newton Optimization (via Optimization Toolbox)
Number of Variables: 10
Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000

Begin Main Loop
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
     0           1               37                            98
     1           2          1.44889      0.0102041           12.9  
     2           3         0.288722              1           5.05  
     3           4        0.0604657              1           1.17  
     4           5         0.032238              1          0.793  
     5           6        0.0130798              1          0.564  
     6           7       0.00500013              1          0.368  
     7           8       0.00196856              1          0.256  
     8           9      0.000983737              1          0.126  
     9          10      0.000489883              1         0.0917  
    10          11      0.000215431              1         0.0573  
    11          12      8.95808e-05              1         0.0457  
    12          13      4.14183e-05              1         0.0258  
    13          14      2.05777e-05              1         0.0191  
    14          15      9.29022e-06              1         0.0104  
    15          16      3.31583e-06              1        0.00825  
    16          17      9.13586e-07              1         0.0046  
    17          18      2.13163e-07              1        0.00199  
    18          19      4.55807e-08              1        0.00121  
    19          20      8.72743e-09              1       0.000601  

Local minimum found.

Optimization completed because the size of the gradient is less than
the selected value of the optimality tolerance.



End Main Loop

Final f: 8.7274e-09
Setup time: 0.014 seconds
Optimization time: 0.065 seconds
Iterations: 19
Total iterations: 20
Exit condition: Gradient norm smaller than tolerance

Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.046793e-04

Tighten it up a bit as before. We don't have control over the print frequency with this one, nor the change in function tolerance. But we can pass through the `StepTolerance`.

[x,f,info] = tt_opt_fminunc(xinit, fgh, 'gtol', 1e-12, 'StepTolerance',1e-14);
Quasi-Newton Optimization (via Optimization Toolbox)
Number of Variables: 10
Parameters: gtol=1e-12, maxiters = 1000, maxsubiters=10000

Begin Main Loop
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
     0           1               37                            98
     1           2          1.44889      0.0102041           12.9  
     2           3         0.288722              1           5.05  
     3           4        0.0604657              1           1.17  
     4           5         0.032238              1          0.793  
     5           6        0.0130798              1          0.564  
     6           7       0.00500013              1          0.368  
     7           8       0.00196856              1          0.256  
     8           9      0.000983737              1          0.126  
     9          10      0.000489883              1         0.0917  
    10          11      0.000215431              1         0.0573  
    11          12      8.95808e-05              1         0.0457  
    12          13      4.14183e-05              1         0.0258  
    13          14      2.05777e-05              1         0.0191  
    14          15      9.29022e-06              1         0.0104  
    15          16      3.31583e-06              1        0.00825  
    16          17      9.13586e-07              1         0.0046  
    17          18      2.13163e-07              1        0.00199  
    18          19      4.55807e-08              1        0.00121  
    19          20      8.72743e-09              1       0.000601  
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
    20          21      1.56598e-09              1       0.000224  
    21          22      2.81917e-10              1        9.2e-05  
    22          23      4.32264e-11              1       3.11e-05  
    23          24      5.20858e-12              1       1.08e-05  
    24          25      1.36374e-12              1       2.87e-06  
    25          26      1.11805e-12              1       7.95e-07  
    26          27      1.10532e-12              1        1.7e-07  
    27          28      1.10468e-12              1       3.72e-08  
    28          29      1.10465e-12              1       5.02e-09  
    29          30      1.10465e-12              1       2.99e-09  
    30          31      1.10465e-12              1        2.9e-09  
    31          32      1.10465e-12              1        2.7e-09  
    32          33      1.10465e-12              1       4.78e-09  
    33          34      1.10464e-12              1       9.07e-09  
    34          35      1.10464e-12              1       1.58e-08  
    35          36      1.10462e-12              1        2.7e-08  
    36          37      1.10457e-12              1       4.52e-08  
    37          38      1.10443e-12              1       7.47e-08  
    38          39      1.10408e-12              1       1.22e-07  
    39          40      1.10315e-12              1          2e-07  
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
    40          41      1.10074e-12              1       3.24e-07  
    41          42      1.09444e-12              1       5.24e-07  
    42          43      1.07818e-12              1        8.4e-07  
    43          44      1.03705e-12              1       1.32e-06  
    44          45      9.38556e-13              1       1.99e-06  
    45          46      7.32415e-13              1       2.68e-06  
    46          47      4.12062e-13              1        2.8e-06  
    47          48      1.21521e-13              1        1.8e-06  
    48          49      1.11669e-14              1        4.9e-07  
    49          50      5.69645e-16              1       6.41e-08  
    50          51      1.47514e-17              1       1.64e-08  
    51          52      7.76625e-20              1       1.58e-09  
    52          53      1.61414e-21              1       2.67e-10  
    53          54      8.82525e-23              1       5.86e-11  

Local minimum found.

Optimization completed because the size of the gradient is less than
the selected value of the optimality tolerance.



End Main Loop

Final f: 8.8253e-23
Setup time: 0.041 seconds
Optimization time: 0.074 seconds
Iterations: 53
Total iterations: 54
Exit condition: Gradient norm smaller than tolerance

With tigher tolerances, we again get closer to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.531186e-10

Adam (internal to Tensor Toolbox)

Adam is a stochastic optimization method, so it's not especially fair to compare it to the other methods. But we do it anyways.

rng('default')
[x,f,info] = tt_opt_adam(xinit, fh, gh,'subiters',500,'maxfails',2,'maxiters',20000,'printitn',500,'fdesc','Function: exact','gdesc','Gradient: exact');
Adam Stochastic Optimization
Number of Variables: 10
Function: exact
Gradient: exact
Max iterations (epochs): 20000
Iterations per epoch: 500
Learning rate / decay / maxfails: 0.01 0.1 2
Backup to best known solution after epoch fail? true
beta1 / beta2 / epsilon: 0.9 0.999 1e-08

Begin Main Loop
Epoch  0: f = 3.700000e+01
Epoch  3: f = 4.895148e-06, step = 0.01, nfails = 1 (resetting to solution from last epoch)
Epoch 14: f = 4.624758e-06, step = 0.001, nfails = 2 (resetting to solution from last epoch)
Epoch 32: f = 1.787285e-06, step = 0.0001, nfails = 3 (resetting to solution from last epoch)
End Main Loop

Final f: 1.7803e-06
Setup time: 0.01 seconds
Optimization time: 0.29 seconds
Number of epochs: 32
Total iterations: 16000
Exit condition: Number of failed epochs > 2

Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 3.608177e-01

Let's tighten the convergence tolerance.

rng(1)
[x,f,info] = tt_opt_adam(xinit, fh, gh,'subiters',500,'maxfails',6,'maxiters',20000,'printitn',500,'fdesc','Function: exact','gdesc','Gradient: exact');
Adam Stochastic Optimization
Number of Variables: 10
Function: exact
Gradient: exact
Max iterations (epochs): 20000
Iterations per epoch: 500
Learning rate / decay / maxfails: 0.01 0.1 6
Backup to best known solution after epoch fail? true
beta1 / beta2 / epsilon: 0.9 0.999 1e-08

Begin Main Loop
Epoch  0: f = 3.700000e+01
Epoch  3: f = 4.895148e-06, step = 0.01, nfails = 1 (resetting to solution from last epoch)
Epoch 14: f = 4.624758e-06, step = 0.001, nfails = 2 (resetting to solution from last epoch)
Epoch 32: f = 1.787285e-06, step = 0.0001, nfails = 3 (resetting to solution from last epoch)
Epoch 500: f = 8.408196e-07, step = 1e-05
Epoch 1000: f = 2.292266e-07, step = 1e-05
Epoch 1011: f = 2.220675e-07, step = 1e-05, nfails = 4 (resetting to solution from last epoch)
Epoch 1500: f = 4.355436e-08, step = 1e-06
Epoch 2000: f = 4.495148e-09, step = 1e-06
Epoch 2234: f = 1.418108e-09, step = 1e-06, nfails = 5 (resetting to solution from last epoch)
Epoch 2500: f = 4.538938e-10, step = 1e-07
Epoch 3000: f = 3.828861e-11, step = 1e-07
Epoch 3188: f = 1.485887e-11, step = 1e-07, nfails = 6 (resetting to solution from last epoch)
Epoch 3500: f = 6.286379e-12, step = 1e-08
Epoch 4000: f = 1.176231e-12, step = 1e-08
Epoch 4500: f = 1.811833e-13, step = 1e-08
Epoch 4838: f = 4.966636e-14, step = 1e-08, nfails = 7 (resetting to solution from last epoch)
End Main Loop

Final f: 4.9634e-14
Setup time: 0.00 seconds
Optimization time: 38.99 seconds
Number of epochs: 4838
Total iterations: 2419000
Exit condition: Number of failed epochs > 6

Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 4.335430e-05