# Developer Information for Optimization Methods in Tensor Toolbox

This page contains information for developers who want to use the optimization method interfaces contained in Tensor Toolbox. These interfaces are meant to standardize across packages with very different ways of calling the functions, different parameter names, etc. For optimzation algorithm installation instructions and parameter options, see Documentation for Tensor Toolbox Optimization Methods.

The above-referenced document is for general users. The information below is for developers of Tensor Toolbox methods that want to use these standardized optimization algorithms.

## Test optimization problem

We use a simple but notoriously difficult test function to demonstrate the optimization methods, a variation on Rosenbrock: This function is implemented in optfunc which takes true as a second argument to return only the gradient. The number of parameters (n) is automatically determined by the method. The optimum is the all-ones vector. The code for the function is as follows:

function [f,g] = optfunc(x,gradonly)
%OPTFUNC Example function for testing optimization software.

n = length(x);
g = zeros(n,1);

f   = .25*(x(1)-1.0)^2;
for i = 2:n
f = f + ( x(i) - (x(i-1))^2 )^2;
end
f = 4*f;

t1      = x(2) - (x(1)^2);
g(1)    = 2.0*( x(1) - 1.0 ) - 16*x(1)*t1;
for i = 2:(n-1)
t2 = t1;
t1 = x(i+1) - (x(i))^2;
g(i) = 8*t2 - 16*x(i)*t1;
end
g(n)  = 8*t1;

f = g;
end
end



We create function handles to use the optimization solvers.

% Create a function handle to evaluate both f and g
fgh = @optfunc;
% Create a function handle to evaluate just f
fh = @optfunc;
% Create a function handle to evaluate just g
gh = @(x) optfunc(x,true);


Set up initial guess and optimum

n = 10;
xinit = [2; ones(n-1,1)];
xopt = ones(n,1);

% Initial function value
finit = fh(xinit);
fprintf('Function value at xinit = %g\n', finit);

% Optimal function value
fopt = fh(xopt);
fprintf('Function value at xopt = %g\n', fopt);

Function value at xinit = 37
Function value at xopt = 0


## Sandia Poblano Toolbox L-BFGS

POBLANO Version 1.1 by Evrim Acar, Daniel Dunlavy, and Tamara Kolda

[x,f,info] = tt_opt_lbfgs(xinit, fgh);

Poblano L-BFGS Optimization
Number of Variables: 10
Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, maxsubiters=10000

Begin Main Loop
Iter  FuncEvals       F(X)          ||G(X)||/N
------ --------- ---------------- ----------------
0         1      37.00000000      10.08959860
1         8       0.91828104       0.92158834
2        10       0.16611190       0.32713416
3        12       0.03552274       0.15994623
4        14       0.00563181       0.05761539
5        16       0.00135701       0.02431580
6        18       0.00039065       0.01439439
7        20       0.00007552       0.00602386
8        22       0.00002250       0.00320035
9        24       0.00000020       0.00041099
10        26       0.00000004       0.00015232
11        28       0.00000001       0.00004888
12        30       0.00000000       0.00001906
13        32       0.00000000       0.00000505
End Main Loop

Final f: 4.2823e-11
Setup time: 0.0037 seconds
Optimization time: 0.034 seconds
Iterations: 13
Total iterations: 32
Exit condition: Successful termination based on StopTol


Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 2.255973e-04


Tighten it up a bit by changing the tolerances. ALso note that we changed the printing frequency to once every 5 iterations rather than every iteration as above.

[x,f,info] = tt_opt_lbfgs(xinit, fgh, 'printitn', 5, 'gtol', 1e-20, 'ftol', 1e-25);

Poblano L-BFGS Optimization
Number of Variables: 10
Parameters: m=5, ftol=1e-25, gtol=1e-20, maxiters = 1000, maxsubiters=10000

Begin Main Loop
Iter  FuncEvals       F(X)          ||G(X)||/N
------ --------- ---------------- ----------------
0         1      37.00000000      10.08959860
5        16       0.00135701       0.02431580
10        26       0.00000004       0.00015232
15        36       0.00000000       0.00000057
20        46       0.00000000       0.00000000
25        57       0.00000000       0.00000000
30        72       0.00000000       0.00000036
35        82       0.00000000       0.00000001
40        92       0.00000000       0.00000000
45       102       0.00000000       0.00000000
50       112       0.00000000       0.00000000
55       123       0.00000000       0.00000000
59       131       0.00000000       0.00000000
End Main Loop

Final f: 3.6076e-27
Setup time: 0.0099 seconds
Optimization time: 0.067 seconds
Iterations: 59
Total iterations: 131
Exit condition: Relative change in F < RelFuncTol


With tighter tolerances, we get closer to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 6.559331e-12


## Stephen Becker L-BFGS-B

[x,f,info] = tt_opt_lbfgsb(xinit, fgh);

L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C)
Number of Variables: 10
Lower bound: -Inf, Upper bound: Inf
Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10

Begin Main Loop
Iter     1, f(x) = 1.263359e+00, ||grad||_infty = 1.20e+01, 1.15e-03
Iter     2, f(x) = 2.814196e-01, ||grad||_infty = 4.85e+00, 1.27e-03
Iter     3, f(x) = 6.099619e-02, ||grad||_infty = 1.11e+00, 1.59e-03
Iter     4, f(x) = 2.574744e-02, ||grad||_infty = 7.85e-01, 1.83e-03
Iter     5, f(x) = 5.050682e-03, ||grad||_infty = 4.27e-01, 2.04e-03
Iter     6, f(x) = 1.585758e-03, ||grad||_infty = 2.84e-01, 2.31e-03
Iter     7, f(x) = 1.701942e-04, ||grad||_infty = 6.15e-02, 2.54e-03
Iter     8, f(x) = 5.353674e-05, ||grad||_infty = 1.69e-02, 2.76e-03
Iter     9, f(x) = 2.318400e-05, ||grad||_infty = 1.39e-02, 2.97e-03
Iter    10, f(x) = 2.569928e-06, ||grad||_infty = 9.27e-03, 3.18e-03
Iter    11, f(x) = 3.586768e-07, ||grad||_infty = 3.46e-03, 3.71e-03
Iter    12, f(x) = 2.598817e-08, ||grad||_infty = 7.97e-04, 3.84e-03
Iter    13, f(x) = 2.466644e-09, ||grad||_infty = 2.50e-04, 4.04e-03
Iter    14, f(x) = 5.126655e-10, ||grad||_infty = 7.74e-05, 4.23e-03
Iter    15, f(x) = 8.969330e-11, ||grad||_infty = 7.19e-05, 4.41e-03
Iter    16, f(x) = 9.642045e-12, ||grad||_infty = 1.23e-05, 4.61e-03
End Main Loop

Final f: 9.6420e-12
Setup time: 0.0014 seconds
Optimization time: 0.005 seconds
Iterations: 16
Total iterations: 35
Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.


Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 2.224764e-04


Tighten it up a bit as before

[x,f,info] = tt_opt_lbfgsb(xinit, fgh, 'printitn', 5, 'gtol', 1e-12, 'ftol', 1e-24);

L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C)
Number of Variables: 10
Lower bound: -Inf, Upper bound: Inf
Parameters: m=5, ftol=1e-24, gtol=1e-12, maxiters = 1000, subiters = 10

Begin Main Loop
Iter     5, f(x) = 5.050682e-03, ||grad||_infty = 4.27e-01, 1.41e-03
Iter    10, f(x) = 2.569928e-06, ||grad||_infty = 9.27e-03, 1.98e-03
Iter    15, f(x) = 8.969330e-11, ||grad||_infty = 7.19e-05, 2.75e-03
Iter    20, f(x) = 1.343287e-12, ||grad||_infty = 9.30e-07, 3.35e-03
Iter    25, f(x) = 1.306474e-12, ||grad||_infty = 3.38e-09, 4.05e-03
Iter    30, f(x) = 1.306458e-12, ||grad||_infty = 1.11e-08, 4.65e-03
Iter    35, f(x) = 1.304805e-12, ||grad||_infty = 1.56e-07, 5.31e-03
Iter    40, f(x) = 1.037402e-12, ||grad||_infty = 1.33e-06, 5.93e-03
Iter    45, f(x) = 6.552398e-16, ||grad||_infty = 1.06e-07, 6.58e-03
Iter    50, f(x) = 1.720279e-19, ||grad||_infty = 2.17e-09, 7.09e-03
Iter    55, f(x) = 2.298904e-21, ||grad||_infty = 1.61e-11, 7.61e-03
Iter    60, f(x) = 1.714203e-21, ||grad||_infty = 7.02e-11, 8.28e-03
Iter    65, f(x) = 7.255157e-24, ||grad||_infty = 1.64e-11, 9.30e-03
Iter    67, f(x) = 2.032944e-26, ||grad||_infty = 4.58e-13, 9.63e-03
End Main Loop

Final f: 2.0329e-26
Setup time: 0.0013 seconds
Optimization time: 0.01 seconds
Iterations: 67
Total iterations: 140


With tigher tolerances, we again get closer to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 3.479939e-14


## MATLAB Optimization Toolbox fminunc (Quasi-Newton Method)

Distributed with the MATLAB Optimization Toolbox.

[x,f,info] = tt_opt_fminunc(xinit, fgh);

Quasi-Newton Optimization (via Optimization Toolbox)
Number of Variables: 10
Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000

Begin Main Loop
First-order
Iteration  Func-count       f(x)        Step-size       optimality
0           1               37                            98
1           2          1.44889      0.0102041           12.9
2           3         0.288722              1           5.05
3           4        0.0604657              1           1.17
4           5         0.032238              1          0.793
5           6        0.0130798              1          0.564
6           7       0.00500013              1          0.368
7           8       0.00196856              1          0.256
8           9      0.000983737              1          0.126
9          10      0.000489883              1         0.0917
10          11      0.000215431              1         0.0573
11          12      8.95808e-05              1         0.0457
12          13      4.14183e-05              1         0.0258
13          14      2.05777e-05              1         0.0191
14          15      9.29022e-06              1         0.0104
15          16      3.31583e-06              1        0.00825
16          17      9.13586e-07              1         0.0046
17          18      2.13163e-07              1        0.00199
18          19      4.55807e-08              1        0.00121
19          20      8.72743e-09              1       0.000601

Local minimum found.

Optimization completed because the size of the gradient is less than
the selected value of the optimality tolerance.

End Main Loop

Final f: 8.7274e-09
Setup time: 0.014 seconds
Optimization time: 0.065 seconds
Iterations: 19
Total iterations: 20
Exit condition: Gradient norm smaller than tolerance


Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 2.046793e-04


Tighten it up a bit as before. We don't have control over the print frequency with this one, nor the change in function tolerance. But we can pass through the StepTolerance.

[x,f,info] = tt_opt_fminunc(xinit, fgh, 'gtol', 1e-12, 'StepTolerance',1e-14);

Quasi-Newton Optimization (via Optimization Toolbox)
Number of Variables: 10
Parameters: gtol=1e-12, maxiters = 1000, maxsubiters=10000

Begin Main Loop
First-order
Iteration  Func-count       f(x)        Step-size       optimality
0           1               37                            98
1           2          1.44889      0.0102041           12.9
2           3         0.288722              1           5.05
3           4        0.0604657              1           1.17
4           5         0.032238              1          0.793
5           6        0.0130798              1          0.564
6           7       0.00500013              1          0.368
7           8       0.00196856              1          0.256
8           9      0.000983737              1          0.126
9          10      0.000489883              1         0.0917
10          11      0.000215431              1         0.0573
11          12      8.95808e-05              1         0.0457
12          13      4.14183e-05              1         0.0258
13          14      2.05777e-05              1         0.0191
14          15      9.29022e-06              1         0.0104
15          16      3.31583e-06              1        0.00825
16          17      9.13586e-07              1         0.0046
17          18      2.13163e-07              1        0.00199
18          19      4.55807e-08              1        0.00121
19          20      8.72743e-09              1       0.000601
First-order
Iteration  Func-count       f(x)        Step-size       optimality
20          21      1.56598e-09              1       0.000224
21          22      2.81917e-10              1        9.2e-05
22          23      4.32264e-11              1       3.11e-05
23          24      5.20858e-12              1       1.08e-05
24          25      1.36374e-12              1       2.87e-06
25          26      1.11805e-12              1       7.95e-07
26          27      1.10532e-12              1        1.7e-07
27          28      1.10468e-12              1       3.72e-08
28          29      1.10465e-12              1       5.02e-09
29          30      1.10465e-12              1       2.99e-09
30          31      1.10465e-12              1        2.9e-09
31          32      1.10465e-12              1        2.7e-09
32          33      1.10465e-12              1       4.78e-09
33          34      1.10464e-12              1       9.07e-09
34          35      1.10464e-12              1       1.58e-08
35          36      1.10462e-12              1        2.7e-08
36          37      1.10457e-12              1       4.52e-08
37          38      1.10443e-12              1       7.47e-08
38          39      1.10408e-12              1       1.22e-07
39          40      1.10315e-12              1          2e-07
First-order
Iteration  Func-count       f(x)        Step-size       optimality
40          41      1.10074e-12              1       3.24e-07
41          42      1.09444e-12              1       5.24e-07
42          43      1.07818e-12              1        8.4e-07
43          44      1.03705e-12              1       1.32e-06
44          45      9.38556e-13              1       1.99e-06
45          46      7.32415e-13              1       2.68e-06
46          47      4.12062e-13              1        2.8e-06
47          48      1.21521e-13              1        1.8e-06
48          49      1.11669e-14              1        4.9e-07
49          50      5.69645e-16              1       6.41e-08
50          51      1.47514e-17              1       1.64e-08
51          52      7.76625e-20              1       1.58e-09
52          53      1.61414e-21              1       2.67e-10
53          54      8.82525e-23              1       5.86e-11

Local minimum found.

Optimization completed because the size of the gradient is less than
the selected value of the optimality tolerance.

End Main Loop

Final f: 8.8253e-23
Setup time: 0.041 seconds
Optimization time: 0.074 seconds
Iterations: 53
Total iterations: 54
Exit condition: Gradient norm smaller than tolerance


With tigher tolerances, we again get closer to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 2.531186e-10


## Adam (internal to Tensor Toolbox)

Adam is a stochastic optimization method, so it's not especially fair to compare it to the other methods. But we do it anyways.

rng('default')

Adam Stochastic Optimization
Number of Variables: 10
Function: exact
Max iterations (epochs): 20000
Iterations per epoch: 500
Learning rate / decay / maxfails: 0.01 0.1 2
Backup to best known solution after epoch fail? true
beta1 / beta2 / epsilon: 0.9 0.999 1e-08

Begin Main Loop
Epoch  0: f = 3.700000e+01
Epoch  3: f = 4.895148e-06, step = 0.01, nfails = 1 (resetting to solution from last epoch)
Epoch 14: f = 4.624758e-06, step = 0.001, nfails = 2 (resetting to solution from last epoch)
Epoch 32: f = 1.787285e-06, step = 0.0001, nfails = 3 (resetting to solution from last epoch)
End Main Loop

Final f: 1.7803e-06
Setup time: 0.01 seconds
Optimization time: 0.29 seconds
Number of epochs: 32
Total iterations: 16000
Exit condition: Number of failed epochs > 2


Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 3.608177e-01


Let's tighten the convergence tolerance.

rng(1)

Adam Stochastic Optimization
Number of Variables: 10
Function: exact
Max iterations (epochs): 20000
Iterations per epoch: 500
Learning rate / decay / maxfails: 0.01 0.1 6
Backup to best known solution after epoch fail? true
beta1 / beta2 / epsilon: 0.9 0.999 1e-08

Begin Main Loop
Epoch  0: f = 3.700000e+01
Epoch  3: f = 4.895148e-06, step = 0.01, nfails = 1 (resetting to solution from last epoch)
Epoch 14: f = 4.624758e-06, step = 0.001, nfails = 2 (resetting to solution from last epoch)
Epoch 32: f = 1.787285e-06, step = 0.0001, nfails = 3 (resetting to solution from last epoch)
Epoch 500: f = 8.408196e-07, step = 1e-05
Epoch 1000: f = 2.292266e-07, step = 1e-05
Epoch 1011: f = 2.220675e-07, step = 1e-05, nfails = 4 (resetting to solution from last epoch)
Epoch 1500: f = 4.355436e-08, step = 1e-06
Epoch 2000: f = 4.495148e-09, step = 1e-06
Epoch 2234: f = 1.418108e-09, step = 1e-06, nfails = 5 (resetting to solution from last epoch)
Epoch 2500: f = 4.538938e-10, step = 1e-07
Epoch 3000: f = 3.828861e-11, step = 1e-07
Epoch 3188: f = 1.485887e-11, step = 1e-07, nfails = 6 (resetting to solution from last epoch)
Epoch 3500: f = 6.286379e-12, step = 1e-08
Epoch 4000: f = 1.176231e-12, step = 1e-08
Epoch 4500: f = 1.811833e-13, step = 1e-08
Epoch 4838: f = 4.966636e-14, step = 1e-08, nfails = 7 (resetting to solution from last epoch)
End Main Loop

Final f: 4.9634e-14
Setup time: 0.00 seconds
Optimization time: 38.99 seconds
Number of epochs: 4838
Total iterations: 2419000
Exit condition: Number of failed epochs > 6


Check how close to optimal

fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));

||x-xopt||/||xopt|| = 4.335430e-05