Implicit Symmetric CP Decomposition for Symmetric K-Tensors
The function cp_isym computes the symmetric CP decomposition of a symmetric tensor that is in symmetric k-tensor format, meaning that it is the sum of symmetric outer products and stored as a symktensor object. The decomposition is described in the following reference:
- S. Sherman, T. G. Kolda, Estimating Higher-Order Moments Using Symmetric Tensor Decomposition, SIAM J. Matrix Analysis and Applications, 41:1369-1387, 2020, https://doi.org/10.1137/19m1299633
Contents
- Requirements
- Create a sample problem based on a Gaussian mixture model.
- Recommend multiple runs of CP-ISYM
- Options for CP-ISYM with L-BFGS-B (default) optimization solver
- Options for CP-ISYM with L-BFGS optimization solver from Poblano
- Options for CP-ISYM with Quasi-Newton optimization solver from Optimization Toolbox
- Options for CP-ISYM with Adam stochastic optimization solver
Requirements
This code requires an optimization solver. Our examples use the L-BFGS-B optimization method; see Optimization Options for details of installation and usage as well as other options for optimization solvers.
Create a sample problem based on a Gaussian mixture model.
Here we used the third-order moment (d=3), for a problem with random variables of dimension n=100. We take a mixture of r=5 Gaussians and collect p=750 observations.
d = 3; n = 100; r = 5; p = 250*r;
We construct the means so that they are collinear, i.e., the cosine of the angle between any two vectors is 0.5. This is a more difficult problem than purely random vectors which are close to orthogonal.
rng('default');
collinearity = 0.5;
Atrue = matrandcong(n,r,collinearity);
Each Gaussian gets equal weight in the mixture.
wghts = 1/r * ones(r,1);
The `true` solution is based on the means and their prevalence in the mixture.
Mtrue = symktensor(wghts,Atrue,d); Mtrue = normalize(Mtrue);
The data tensor is based on noisy observations from the mixture model. The idx determines which Gaussian is sampled. And the samples have noise as specified by stddev. The result is assembled into an observation tensor that in symktensor format.
idx = randsample(r,p,true,wghts); stddev = 1e-3; noise = stddev * randn(n,p); V = Atrue(:,idx) + noise; X = symktensor(1/p*ones(p,1),V,d); X = normalize(X,0);
Recommend multiple runs of CP-ISYM
Since this is a non-convex optimization problem, it's critically important to do multiple runs of the optimzation problem. In the code below, we run 10 times. Not every run will converge to the global minimum. We take solution with the lowest function value, which should ideally yield the best match to the true solution. For our artificial problem, we can actually verify the score of how well the computed solution matches with the ideal solution.
rng(7) nruns = 10; M = cell(nruns,1); info = cell(nruns,1); for i = 1:nruns fprintf('------------- Run %d of %d --------------\n', i, nruns); [M{i},info{i}] = cp_isym(X,r,'Xnormsqr','exact','printitn',0); fprintf('Final F: %.2e, Score: %.2f\n', info{i}.f, score(M{i},Mtrue)); if i == 1 ifbest = i; fbest = info{i}.f; sfbest = score(M{i},Mtrue); elseif info{i}.f < fbest ifbest = i; fbest = info{i}.f; sfbest = score(M{i},Mtrue); end end
------------- Run 1 of 10 -------------- Final F: 1.94e-02, Score: 0.38 ------------- Run 2 of 10 -------------- Final F: 1.94e-02, Score: 0.41 ------------- Run 3 of 10 -------------- Final F: 1.20e-09, Score: 0.95 ------------- Run 4 of 10 -------------- Final F: 7.76e-09, Score: 0.95 ------------- Run 5 of 10 -------------- Final F: 1.94e-02, Score: 0.41 ------------- Run 6 of 10 -------------- Final F: 5.01e-09, Score: 0.95 ------------- Run 7 of 10 -------------- Final F: 3.65e-09, Score: 0.95 ------------- Run 8 of 10 -------------- Final F: 7.29e-09, Score: 0.95 ------------- Run 9 of 10 -------------- Final F: 1.94e-02, Score: 0.41 ------------- Run 10 of 10 -------------- Final F: 6.14e-02, Score: 0.00
fprintf('------------- Best of %d runs --------------\n', nruns); fprintf('Run %d -> Final F: %.2e, Score: %.2f\n', ifbest, fbest, sfbest);
------------- Best of 10 runs -------------- Run 3 -> Final F: 1.20e-09, Score: 0.95
Options for CP-ISYM with L-BFGS-B (default) optimization solver
The CP solution should produce something close to Mtrue, but it will not be exact because of the randomness in building the observation tensor. Here we show what happens when the true solution is provided as an initial guess. The function should converge to zero, and the similarlity score should be close to one. The method defaults to 'lfbgsb'.
[M,info] = cp_isym(X,r,'Xnormsqr','exact','init',Mtrue); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = 1.798804e-04, ||grad||_infty = 1.54e-02, 4.50e-03 Iter 2, f(x) = 2.009885e-06, ||grad||_infty = 1.15e-03, 5.64e-03 Iter 3, f(x) = 1.320009e-06, ||grad||_infty = 7.90e-04, 6.92e-03 Iter 4, f(x) = 9.160013e-07, ||grad||_infty = 2.02e-04, 8.24e-03 Iter 5, f(x) = 6.929470e-07, ||grad||_infty = 1.16e-04, 9.59e-03 Iter 6, f(x) = 2.975243e-07, ||grad||_infty = 2.70e-04, 1.09e-02 Iter 7, f(x) = 1.647469e-07, ||grad||_infty = 1.26e-04, 1.23e-02 Iter 8, f(x) = 1.401774e-07, ||grad||_infty = 6.75e-05, 1.37e-02 Iter 9, f(x) = 1.275847e-07, ||grad||_infty = 7.95e-05, 1.50e-02 Iter 10, f(x) = 7.883570e-08, ||grad||_infty = 1.39e-04, 1.70e-02 Iter 11, f(x) = 5.170653e-08, ||grad||_infty = 3.19e-05, 1.83e-02 Iter 12, f(x) = 3.582501e-08, ||grad||_infty = 1.63e-05, 1.95e-02 Iter 13, f(x) = 1.921946e-08, ||grad||_infty = 9.09e-05, 2.10e-02 Iter 14, f(x) = 6.336888e-09, ||grad||_infty = 4.28e-05, 2.22e-02 Iter 15, f(x) = 3.691645e-09, ||grad||_infty = 3.04e-05, 2.37e-02 Iter 16, f(x) = 2.252858e-09, ||grad||_infty = 9.17e-06, 2.51e-02 End Main Loop Final f: 2.2529e-09 Setup time: 0.0024 seconds Optimization time: 0.026 seconds Iterations: 16 Total iterations: 35 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F: 2.25e-09, Score: 0.95 ***
It is also possible to specify the initial guess as factor matrix
[M,info] = cp_isym(X,r,'Xnormsqr','exact','init',Mtrue); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = 1.798804e-04, ||grad||_infty = 1.54e-02, 4.42e-03 Iter 2, f(x) = 2.009885e-06, ||grad||_infty = 1.15e-03, 5.64e-03 Iter 3, f(x) = 1.320009e-06, ||grad||_infty = 7.90e-04, 7.02e-03 Iter 4, f(x) = 9.160013e-07, ||grad||_infty = 2.02e-04, 8.23e-03 Iter 5, f(x) = 6.929470e-07, ||grad||_infty = 1.16e-04, 1.00e-02 Iter 6, f(x) = 2.975243e-07, ||grad||_infty = 2.70e-04, 1.14e-02 Iter 7, f(x) = 1.647469e-07, ||grad||_infty = 1.26e-04, 1.26e-02 Iter 8, f(x) = 1.401774e-07, ||grad||_infty = 6.75e-05, 1.39e-02 Iter 9, f(x) = 1.275847e-07, ||grad||_infty = 7.95e-05, 1.53e-02 Iter 10, f(x) = 7.883570e-08, ||grad||_infty = 1.39e-04, 1.67e-02 Iter 11, f(x) = 5.170653e-08, ||grad||_infty = 3.19e-05, 1.80e-02 Iter 12, f(x) = 3.582501e-08, ||grad||_infty = 1.63e-05, 1.93e-02 Iter 13, f(x) = 1.921946e-08, ||grad||_infty = 9.09e-05, 2.06e-02 Iter 14, f(x) = 6.336888e-09, ||grad||_infty = 4.28e-05, 2.19e-02 Iter 15, f(x) = 3.691645e-09, ||grad||_infty = 3.04e-05, 2.31e-02 Iter 16, f(x) = 2.252858e-09, ||grad||_infty = 9.17e-06, 2.46e-02 End Main Loop Final f: 2.2529e-09 Setup time: 0.0016 seconds Optimization time: 0.025 seconds Iterations: 16 Total iterations: 35 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F: 2.25e-09, Score: 0.95 ***
So far we've been using the exact objective function which is close to zero for the optimal zero. We can, however, ignore the constant |X|^2 term in objective function, which saves some preprocessing cost. The main disadvantage of ignoring the constant term is that the final function value will no longer be near zero.
[M,info] = cp_isym(X,r,'Xnormsqr',0,'init',Atrue,'printitn',1); fprintf('\n\t\t*** Final F (adjusted): %.2e, Score: %.2f ***\n\n', ... info.f + norm(X)^2, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = -2.108869e-01, ||grad||_infty = 5.67e-02, 3.00e-03 Iter 2, f(x) = -2.551309e-01, ||grad||_infty = 4.68e-02, 5.40e-03 Iter 3, f(x) = -2.954389e-01, ||grad||_infty = 1.44e-02, 7.86e-03 Iter 4, f(x) = -2.983326e-01, ||grad||_infty = 1.31e-02, 8.97e-03 Iter 5, f(x) = -2.998836e-01, ||grad||_infty = 5.63e-03, 1.03e-02 Iter 6, f(x) = -3.003519e-01, ||grad||_infty = 5.88e-04, 1.20e-02 Iter 7, f(x) = -3.003573e-01, ||grad||_infty = 7.43e-05, 1.36e-02 Iter 8, f(x) = -3.003574e-01, ||grad||_infty = 5.83e-05, 1.49e-02 Iter 9, f(x) = -3.003576e-01, ||grad||_infty = 6.35e-05, 1.62e-02 Iter 10, f(x) = -3.003578e-01, ||grad||_infty = 7.69e-05, 1.76e-02 Iter 11, f(x) = -3.003579e-01, ||grad||_infty = 6.13e-05, 2.02e-02 Iter 12, f(x) = -3.003580e-01, ||grad||_infty = 1.96e-05, 2.14e-02 Iter 13, f(x) = -3.003580e-01, ||grad||_infty = 2.51e-05, 2.26e-02 Iter 14, f(x) = -3.003580e-01, ||grad||_infty = 3.02e-06, 2.38e-02 End Main Loop Final f: -3.0036e-01 Setup time: 0.0013 seconds Optimization time: 0.024 seconds Iterations: 14 Total iterations: 33 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F (adjusted): 4.60e-10, Score: 0.95 ***
By default, the method uses the randomized range finder (rrf) for the initial guess. This is the default initial guess and works well.
rng(1) [M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'init','rrf'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 5, f(x) = 4.265680e-02, ||grad||_infty = 1.81e-02, 6.42e-03 Iter 10, f(x) = 2.260594e-02, ||grad||_infty = 1.28e-02, 1.04e-02 Iter 15, f(x) = 1.363733e-02, ||grad||_infty = 1.88e-02, 1.48e-02 Iter 20, f(x) = 1.381487e-03, ||grad||_infty = 1.22e-02, 1.95e-02 Iter 25, f(x) = 5.283198e-05, ||grad||_infty = 1.45e-03, 2.32e-02 Iter 30, f(x) = 6.142064e-07, ||grad||_infty = 1.11e-04, 2.67e-02 Iter 35, f(x) = 1.823079e-08, ||grad||_infty = 1.78e-05, 3.09e-02 Iter 39, f(x) = 8.114700e-10, ||grad||_infty = 3.67e-06, 3.46e-02 End Main Loop Final f: 8.1147e-10 Setup time: 0.00097 seconds Optimization time: 0.035 seconds Iterations: 39 Total iterations: 86 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F: 8.11e-10, Score: 0.95 ***
If we run instead with a random Gaussian initial guess, we are unlikey to converge.
rng(77) [M,info,M0] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'init',@randn); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 5, f(x) = 1.568859e+00, ||grad||_infty = 2.43e+03, 4.85e-03 Iter 10, f(x) = 3.002914e-01, ||grad||_infty = 1.21e-02, 9.12e-03 Iter 11, f(x) = 3.002914e-01, ||grad||_infty = 1.68e-03, 9.79e-03 End Main Loop Final f: 3.0029e-01 Setup time: 0.0032 seconds Optimization time: 0.01 seconds Iterations: 11 Total iterations: 24 Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH. *** Final F: 3.00e-01, Score: 0.00 ***
Sometimes a random initialization does a little better if the convergence tolerances are tightened and the maximum iterations is increased. We also increase the memory in Limited-memory BFGS to see if that helps. It gets a little better, but still no where near the solution with the randomized range finder.
[M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'init',M0,... 'ftol',1e-14,'gtol',1e-8,'maxiters',10000,'m',25); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=25, ftol=1e-14, gtol=1e-08, maxiters = 10000, subiters = 10 Begin Main Loop Iter 5, f(x) = 1.568859e+00, ||grad||_infty = 2.43e+03, 4.86e-03 Iter 10, f(x) = 3.002914e-01, ||grad||_infty = 8.83e-03, 8.24e-03 Iter 15, f(x) = 3.002914e-01, ||grad||_infty = 9.08e-04, 1.36e-02 Iter 20, f(x) = 3.002914e-01, ||grad||_infty = 1.39e-02, 1.75e-02 Iter 25, f(x) = 3.002914e-01, ||grad||_infty = 1.59e-01, 2.18e-02 Iter 30, f(x) = 3.002899e-01, ||grad||_infty = 1.81e+00, 2.69e-02 Iter 35, f(x) = 2.191333e-01, ||grad||_infty = 3.14e+02, 3.63e-02 Iter 40, f(x) = 1.203681e-01, ||grad||_infty = 4.07e+03, 4.29e-02 Iter 45, f(x) = 1.116589e-01, ||grad||_infty = 7.46e+02, 5.05e-02 Iter 50, f(x) = 1.041222e-01, ||grad||_infty = 4.91e+01, 6.26e-02 Iter 55, f(x) = 1.034506e-01, ||grad||_infty = 1.35e-03, 6.81e-02 Iter 60, f(x) = 1.034506e-01, ||grad||_infty = 4.54e-02, 7.33e-02 Iter 65, f(x) = 1.034504e-01, ||grad||_infty = 5.71e-01, 7.93e-02 Iter 70, f(x) = 1.034220e-01, ||grad||_infty = 6.39e+00, 8.47e-02 Iter 75, f(x) = 1.021516e-01, ||grad||_infty = 1.85e+02, 9.17e-02 Iter 80, f(x) = 8.415741e-02, ||grad||_infty = 5.40e+02, 9.89e-02 Iter 85, f(x) = 8.404990e-02, ||grad||_infty = 6.41e+00, 1.05e-01 Iter 90, f(x) = 8.404952e-02, ||grad||_infty = 1.19e-01, 1.12e-01 Iter 95, f(x) = 8.404951e-02, ||grad||_infty = 9.01e-03, 1.18e-01 Iter 97, f(x) = 8.404951e-02, ||grad||_infty = 5.18e-05, 1.21e-01 End Main Loop Final f: 8.4050e-02 Setup time: 0.00064 seconds Optimization time: 0.12 seconds Iterations: 97 Total iterations: 214 Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH. *** Final F: 8.40e-02, Score: 0.07 ***
Options for CP-ISYM with L-BFGS optimization solver from Poblano
rng(1) [M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'method','lbfgs'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Poblano L-BFGS Optimization Order, Size: 3, 100 Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, maxsubiters=10000 Begin Main Loop Iter FuncEvals F(X) ||G(X)||/N ------ --------- ---------------- ---------------- 0 1 3.92027347 0.02425307 5 18 0.04889771 0.00026180 10 30 0.02227461 0.00010159 15 45 0.01590084 0.00016813 20 60 0.00156181 0.00008811 25 70 0.00005713 0.00001458 26 73 0.00001608 0.00000982 End Main Loop Final f: 1.6075e-05 Setup time: 0.0018 seconds Optimization time: 0.087 seconds Iterations: 26 Total iterations: 73 Exit condition: Successful termination based on StopTol *** Final F: 1.61e-05, Score: 0.95 ***
Options for CP-ISYM with Quasi-Newton optimization solver from Optimization Toolbox
rng(1) [M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'method','fminunc'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Quasi-Newton Optimization (via Optimization Toolbox) Order, Size: 3, 100 Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000 Begin Main Loop First-order Iteration Func-count f(x) Step-size optimality 0 1 3.92027 1.95 1 3 0.275394 0.164109 0.0628 2 7 0.14028 12.0063 0.11 3 8 0.128657 1 0.0684 4 10 0.085529 0.5 0.0396 5 11 0.0815813 1 0.0462 6 12 0.0798902 1 0.00739 7 14 0.0776973 10 0.0267 8 15 0.0750789 1 0.0245 9 17 0.0704473 0.480552 0.042 10 18 0.0670865 1 0.0368 11 19 0.0616269 1 0.0325 12 20 0.055824 1 0.0328 13 21 0.0544128 1 0.0339 14 22 0.0505937 1 0.0154 15 23 0.0481738 1 0.0168 16 24 0.0416669 1 0.054 17 25 0.0381286 1 0.0336 18 26 0.0335275 1 0.0159 19 27 0.0317987 1 0.0146 First-order Iteration Func-count f(x) Step-size optimality 20 28 0.0301083 1 0.0111 21 29 0.0293649 1 0.0161 22 30 0.027827 1 0.00893 23 31 0.0265521 1 0.00975 24 32 0.0261249 1 0.00918 25 33 0.0251787 1 0.00625 26 34 0.02455 1 0.00806 27 35 0.023779 1 0.00911 28 36 0.0230934 1 0.00845 29 37 0.0224019 1 0.00844 30 38 0.0220964 1 0.00687 31 39 0.0218681 1 0.0049 32 40 0.021706 1 0.00375 33 41 0.0215136 1 0.0033 34 42 0.0213676 1 0.00304 35 43 0.0212558 1 0.00323 36 44 0.0211222 1 0.00409 37 45 0.0208738 1 0.00475 38 46 0.0205572 1 0.00656 39 47 0.0201352 1 0.0065 First-order Iteration Func-count f(x) Step-size optimality 40 48 0.0196422 1 0.00387 41 49 0.0193932 1 0.00333 42 50 0.0191319 1 0.00872 43 51 0.0186628 1 0.00878 44 52 0.017685 1 0.00907 45 53 0.0169716 1 0.00861 46 54 0.01577 1 0.00994 47 56 0.0131656 0.5 0.0106 48 57 0.0117538 1 0.0153 49 58 0.00879083 1 0.0134 50 59 0.00839261 1 0.012 51 60 0.00394004 1 0.00878 52 61 0.00304266 1 0.013 53 62 0.00244168 1 0.0069 54 63 0.00225538 1 0.00627 55 64 0.0014968 1 0.00464 56 65 0.00105029 1 0.00366 57 66 0.000674101 1 0.00456 58 67 0.000456871 1 0.0042 59 68 0.000245825 1 0.00299 First-order Iteration Func-count f(x) Step-size optimality 60 69 9.62095e-05 1 0.0024 61 70 4.02262e-05 1 0.00114 62 71 2.70099e-05 1 0.000664 63 72 2.21994e-05 1 0.000556 64 73 1.73964e-05 1 0.000481 65 74 1.28057e-05 1 0.000433 66 75 1.05719e-05 1 0.000388 67 76 9.4027e-06 1 0.00034 68 77 7.7716e-06 1 0.000308 69 78 5.32426e-06 1 0.000286 70 79 2.70776e-06 1 0.000259 71 80 1.17346e-06 1 0.000219 72 81 6.75895e-07 1 0.000149 73 82 5.27851e-07 1 0.000111 74 83 4.0086e-07 1 9.15e-05 75 84 2.47388e-07 1 6.9e-05 76 85 1.30409e-07 1 5.92e-05 77 86 7.85393e-08 1 4.16e-05 78 87 6.17683e-08 1 3.58e-05 79 88 5.12695e-08 1 2.56e-05 Local minimum found. Optimization completed because the size of the gradient is less than the selected value of the optimality tolerance. End Main Loop Final f: 5.1269e-08 Setup time: 1.1 seconds Optimization time: 0.63 seconds Iterations: 79 Total iterations: 88 Exit condition: Gradient norm smaller than tolerance *** Final F: 5.13e-08, Score: 0.95 ***
Options for CP-ISYM with Adam stochastic optimization solver
When the number of observations is very large, a stochastic method may be faster. Our example is very small (p=1250), but we use it nonetheless just to demonstrate the capabilities of the stochastic method. The method is called by specifying the 'method' to be 'adam'.
rng(5) [M,info] = cp_isym(X,r,'method','adam'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: 1000 samples out of 1250 observations Gradient: using 100 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 100 Learning rate / decay / maxfails: 0.01 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f~ = 4.828625e+00 Epoch 1: f~ = -2.657838e-01, step = 0.01 Epoch 2: f~ = -2.832830e-01, step = 0.01 Epoch 3: f~ = -2.982487e-01, step = 0.01 Epoch 4: f~ = -2.985366e-01, step = 0.01 Epoch 5: f~ = -2.984525e-01, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 6: f~ = -3.001280e-01, step = 0.001 Epoch 7: f~ = -2.996104e-01, step = 0.001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f~: -3.0013e-01 Setup time: 0.01 seconds Optimization time: 0.40 seconds Number of epochs: 7 Total iterations: 700 Exit condition: Number of failed epochs > 1 *** Final F: 1.51e-04, Score: 0.95 ***
Since this problem is small, we can compute the function value exactly.
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 100 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 100 Learning rate / decay / maxfails: 0.01 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 3.595388e-02, step = 0.01 Epoch 2: f = 1.618031e-02, step = 0.01 Epoch 3: f = 1.507292e-03, step = 0.01 Epoch 4: f = 2.553210e-03, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 5: f = 1.624327e-04, step = 0.001 Epoch 6: f = 1.601540e-04, step = 0.001 Epoch 7: f = 3.981757e-04, step = 0.001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f: 1.6015e-04 Setup time: 0.01 seconds Optimization time: 0.41 seconds Number of epochs: 7 Total iterations: 700 Exit condition: Number of failed epochs > 1 *** Final F: 1.60e-04, Score: 0.96 ***
We can also change the number of samples in each stochastic gradient ('gsamp') and correspondingly increase the number of iterations per epoch ('subiters').
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact',... 'gsamp',10,'subiters',250); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 10 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 250 Learning rate / decay / maxfails: 0.01 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 2.239609e-02, step = 0.01 Epoch 2: f = 6.190717e-03, step = 0.01 Epoch 3: f = 1.955220e-02, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 4: f = 6.693318e-03, step = 0.001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f: 6.1907e-03 Setup time: 0.00 seconds Optimization time: 0.26 seconds Number of epochs: 4 Total iterations: 1000 Exit condition: Number of failed epochs > 1 *** Final F: 6.19e-03, Score: 0.89 ***
We can further improve by increasing the number of times we decrease the step length ('maxfails').
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact',... 'gsamp',10,'subiters',250,'maxfails',4); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 10 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 250 Learning rate / decay / maxfails: 0.01 0.1 4 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 2.239609e-02, step = 0.01 Epoch 2: f = 6.190717e-03, step = 0.01 Epoch 3: f = 1.955220e-02, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 4: f = 6.693318e-03, step = 0.001, nfails = 2 (resetting to solution from last epoch) Epoch 5: f = 1.111507e-03, step = 0.0001 Epoch 6: f = 3.658467e-04, step = 0.0001 Epoch 7: f = 1.958052e-04, step = 0.0001 Epoch 8: f = 2.450883e-04, step = 0.0001, nfails = 3 (resetting to solution from last epoch) Epoch 9: f = 1.802417e-04, step = 1e-05 Epoch 10: f = 1.692018e-04, step = 1e-05 Epoch 11: f = 1.604512e-04, step = 1e-05 Epoch 12: f = 1.384002e-04, step = 1e-05 Epoch 13: f = 1.418440e-04, step = 1e-05, nfails = 4 (resetting to solution from last epoch) Epoch 14: f = 1.356883e-04, step = 1e-06 Epoch 15: f = 1.352304e-04, step = 1e-06 Epoch 16: f = 1.333213e-04, step = 1e-06 Epoch 17: f = 1.320493e-04, step = 1e-06 Epoch 18: f = 1.304712e-04, step = 1e-06 Epoch 19: f = 1.304975e-04, step = 1e-06, nfails = 5 (resetting to solution from last epoch) End Main Loop Final f: 1.3047e-04 Setup time: 0.01 seconds Optimization time: 1.41 seconds Number of epochs: 19 Total iterations: 4750 Exit condition: Number of failed epochs > 4 *** Final F: 1.30e-04, Score: 0.97 ***
We also change the initial learning rate via 'rate'.
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact','rate',1e-3); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 100 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 100 Learning rate / decay / maxfails: 0.001 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 1.636029e-01, step = 0.001 Epoch 2: f = 6.960158e-02, step = 0.001 Epoch 3: f = 4.769470e-02, step = 0.001 Epoch 4: f = 3.871023e-02, step = 0.001 Epoch 5: f = 3.369702e-02, step = 0.001 Epoch 6: f = 3.012290e-02, step = 0.001 Epoch 7: f = 2.722983e-02, step = 0.001 Epoch 8: f = 2.466563e-02, step = 0.001 Epoch 9: f = 2.200866e-02, step = 0.001 Epoch 10: f = 1.999008e-02, step = 0.001 Epoch 11: f = 1.800170e-02, step = 0.001 Epoch 12: f = 1.660821e-02, step = 0.001 Epoch 13: f = 1.525271e-02, step = 0.001 Epoch 14: f = 1.399967e-02, step = 0.001 Epoch 15: f = 1.185572e-02, step = 0.001 Epoch 16: f = 9.517354e-03, step = 0.001 Epoch 17: f = 6.242648e-03, step = 0.001 Epoch 18: f = 3.505938e-03, step = 0.001 Epoch 19: f = 1.700016e-03, step = 0.001 Epoch 20: f = 7.406634e-04, step = 0.001 Epoch 21: f = 4.796099e-04, step = 0.001 Epoch 22: f = 4.453608e-04, step = 0.001 Epoch 23: f = 1.749952e-04, step = 0.001 Epoch 24: f = 1.949576e-04, step = 0.001, nfails = 1 (resetting to solution from last epoch) Epoch 25: f = 6.149062e-05, step = 0.0001 Epoch 26: f = 7.074476e-05, step = 0.0001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f: 6.1491e-05 Setup time: 0.01 seconds Optimization time: 1.13 seconds Number of epochs: 26 Total iterations: 2600 Exit condition: Number of failed epochs > 1 *** Final F: 6.15e-05, Score: 0.96 ***