Creating Test Problems and Initial Guesses

>> Tensor Toolbox >> Working with Tensors >> Creating Test Problems

We demonstrate how to use Tensor Toolbox create_problem and create_guess functions to create test problems for fitting algorithms.

Additionally, we describe create_problem_binary, which generates binary-valued tensor data specifically for CP-type problems.

Creating a CP test problem
Creating a Tucker test problem
Recreating the same test problem
Checking default parameters and recreating the same test problem
Options for creating factor matrices, core tensors, and lambdas
Generating data from an existing solution
Creating dense missing data problems
Creating sparse missing data problems.
Create missing data problems with a pre-specified pattern
Creating sparse problems (CP only)
Generating an initial guess
Creating binary CP test problems with create_problem_binary

rng('default'); %<- Setting random seed for reproducibility of this script

Creating a CP test problem

The create_problem function allows a user to generate a test problem with a known solution having a pre-specified solution. The create_problem function generates both the solution (as a ktensor for CP) and the test data (as a tensor). We later show that a pre-specificed solution can be used as well.

% Create a problem
info = create_problem('Size', [5 4 3], 'Num_Factors', 3, 'Noise', 0.10);

% Display the solution created by create_problem
soln = info.Soln

soln is a ktensor of size 5 x 4 x 3
	soln.lambda = 
		    0.6948    0.3171    0.9502
	soln.U{1} = 
		    0.5377   -1.3077   -1.3499
		    1.8339   -0.4336    3.0349
		   -2.2588    0.3426    0.7254
		    0.8622    3.5784   -0.0631
		    0.3188    2.7694    0.7147
	soln.U{2} = 
		   -0.2050    1.4172    1.6302
		   -0.1241    0.6715    0.4889
		    1.4897   -1.2075    1.0347
		    1.4090    0.7172    0.7269
	soln.U{3} = 
		   -0.3034    0.8884   -0.8095
		    0.2939   -1.1471   -2.9443
		   -0.7873   -1.0689    1.4384

% Display the data created by create_problem
data = info.Data

data is a tensor of size 5 x 4 x 3
	data(:,:,1) = 
	    0.6406   -0.0053    1.7089    0.3286
	   -3.9326   -1.1850   -3.1232   -1.8339
	   -0.9485   -0.3204    0.0406    0.0859
	    1.6481    0.9261   -1.8303    0.6222
	    0.3243    0.6169   -1.9710   -0.0077
	data(:,:,2) = 
	    7.1696    2.6513    4.2567    2.9938
	  -14.0474   -4.0639   -8.6171   -5.9845
	   -3.3801   -1.5008   -2.3947   -2.6743
	   -1.4145   -1.0496    1.9542   -0.3994
	   -4.3450   -2.0053   -0.4684   -2.1417
	data(:,:,3) = 
	   -2.3827   -0.8279   -3.2592   -1.6532
	    7.6351    2.4764    2.2490    1.7676
	    1.2927    0.5233    3.0407    2.3518
	   -1.6987   -0.8768    0.9062   -2.2220
	    0.8113   -0.0613    2.7203   -0.3508

% The difference between true solution and measured data should match the
% specified 10% noise.
diff = norm(full(info.Soln) - info.Data)/norm(full(info.Soln))

diff =

    0.1000

Creating a Tucker test problem

The create_problem function can also be used to create Tucker problems by specifying the 'Type' as 'Tucker'. In this case, the create_problem function generates both the solution (as a ttensor for Tucker) and the test data (as a tensor).

% Create a problem
info = create_problem('Type', 'Tucker', 'Size', [5 4 3], 'Num_Factors', [3 3 2]);

% Display the Tucker-type solution created by create_problem
soln = info.Soln

soln is a ttensor of size 5 x 4 x 3
	soln.core is a tensor of size 3 x 3 x 2
		soln.core(:,:,1) = 
	   -1.5771    0.0335    0.3502
	    0.5080   -1.3337   -0.2991
	    0.2820    1.1275    0.0229
		soln.core(:,:,2) = 
	   -0.2620   -0.8314   -0.5336
	   -1.7502   -0.9792   -2.0026
	   -0.2857   -1.1564    0.9642
	soln.U{1} = 
		   -1.7947    0.3035   -0.1941
		    0.8404   -0.6003   -2.1384
		   -0.8880    0.4900   -0.8396
		    0.1001    0.7394    1.3546
		   -0.5445    1.7119   -1.0722
	soln.U{2} = 
		    0.9610   -0.1977    1.3790
		    0.1240   -1.2078   -1.0582
		    1.4367    2.9080   -0.4686
		   -1.9609    0.8252   -0.2725
	soln.U{3} = 
		    1.0984   -2.0518
		   -0.2779   -0.3538
		    0.7015   -0.8236

% Difference between true solution and measured data (default noise is 10%)
diff = norm(full(info.Soln) - info.Data)/norm(full(info.Soln))

diff =

    0.1000

Recreating the same test problem

We can recreate exactly the same test problem when we use the same random seed and other parameters.

% Set-up, including specifying random state
sz = [5 4 3]; %<- Size
nf = 2; %<- Number of components
state = RandStream.getGlobalStream.State; %<- Random state

% Generate first test problem
info1 = create_problem('Size', sz, 'Num_Factors', nf, 'State', state);

% Generate second identical test problem
info2 = create_problem('Size', sz, 'Num_Factors', nf, 'State', state);

% Check that the solutions are identical
tf = isequal(info1.Soln, info2.Soln)

tf =

  logical

   1

% Check that the data are identical
diff = norm(info1.Data - info2.Data)

diff =

     0

Checking default parameters and recreating the same test problem

The create_problem function returns the parameters that were used to generate it. These can be used to see the defaults. Additionally, if these are saved, they can be used to recreate the same test problems for future experiments.

% Generate test problem and use second output argument for parameters.
[info1,params] = create_problem('Size', [5 4 3], 'Num_Factors', 2);

% Here are the parameters
params

params = 

  struct with fields:

       Core_Generator: 'randn'
     Factor_Generator: 'randn'
     Lambda_Generator: 'rand'
                    M: 0
                Noise: 0.1000
          Num_Factors: 2
                 Size: [5 4 3]
                 Soln: []
    Sparse_Generation: 0
             Sparse_M: 0
                State: [625×1 uint32]
            Symmetric: []
                 Type: 'CP'

% Recreate an identical test problem
info2 = create_problem(params);

% Check that the solutions are identical
tf = isequal(info1.Soln, info2.Soln)

tf =

  logical

   1

% Check that the data are identical
diff = norm(info1.Data - info2.Data)

diff =

     0

Options for creating factor matrices, core tensors, and lambdas

Any function with two arguments specifying the size can be used to generate the factor matrices. This is specified by the 'Factor_Generator' option to create_problem.

Pre-defined options for 'Factor_Generator' for creating factor matrices (for CP or Tucker) include:

'rand' - Uniform on [0,1]
'randn' - Gaussian with mean 0 and std 1
'orthogonal' - Generates a random orthogonal matrix. This option only works when the number of factors is less than or equal to the smallest dimension.
'stochastic' - Generates nonnegative factor matrices so that each column sums to one.

Pre-defined options for 'Lambda_Generator' for creating lambda vector (for CP) include:

'rand' - Uniform on [0,1]
'randn' - Gaussian with mean 0 and std 1
'orthogonal' - Creates a random vector with norm one.
'stochastic' - Creates a random nonnegative vector whose entries sum to one.

Pre-defined options for 'Core_Generator' for creating core tensors (for Tucker) include:

'rand' - Uniform on [0,1]
'randn' - Gaussian with mean 0 and std 1

% Here is ane example of a custom factor generator
factor_generator = @(m,n) 100*rand(m,n);
info = create_problem('Size', [5 4 3], 'Num_Factors', 2, ...
    'Factor_Generator', factor_generator, 'Lambda_Generator', @ones);
first_factor_matrix = info.Soln.U{1}

first_factor_matrix =

   34.3877   81.7761
   58.4069   26.0728
   10.7769   59.4356
   90.6308    2.2513
   87.9654   42.5259

% Here is an example of a custom core generator for Tucker:
info = create_problem('Type', 'Tucker', 'Size', [5 4 3], ...
    'Num_Factors', [2 2 2], 'Core_Generator', @tenones);
core = info.Soln.core

core is a tensor of size 2 x 2 x 2
	core(:,:,1) = 
	     1     1
	     1     1
	core(:,:,2) = 
	     1     1
	     1     1

% Here's another example for CP, this time using a function to create
% factor matrices such that the inner products of the columns are
% prespecified.
info = create_problem('Size', [5 4 3], 'Num_Factors', 3, ...
    'Factor_Generator', @(m,n) matrandcong(m,n,.9));
U = info.Soln.U{1};
congruences = U'*U

congruences =

    1.0000    0.9000    0.9000
    0.9000    1.0000    0.9000
    0.9000    0.9000    1.0000

Generating data from an existing solution

It's possible to skip the solution generation altogether and instead just generate appropriate test data.

% Manually generate a test problem (or it comes from some
% previous call to |create_problem|.
soln = ktensor({rand(50,3), rand(40,3), rand(30,3)});

% Use that soln to create new test problem.
info = create_problem('Soln', soln);

% Check whether solutions is equivalent to the input
iseq = isequal(soln,info.Soln)

iseq =

  logical

   1

Creating dense missing data problems

It's possible to create problems that have a percentage of missing data. The problem generator randomly creates the pattern of missing data.

% Specify 25% missing data as follows:
[info,params] = create_problem('Size', [5 4 3], 'M', 0.25);

% Here is the pattern of known data (1 = known, 0 = unknown)
info.Pattern

ans is a tensor of size 5 x 4 x 3
	ans(:,:,1) = 
	     1     1     1     0
	     1     1     1     1
	     1     1     1     1
	     1     1     1     0
	     1     1     0     0
	ans(:,:,2) = 
	     1     1     1     0
	     1     1     0     0
	     1     1     0     1
	     1     1     0     0
	     1     1     0     1
	ans(:,:,3) = 
	     1     1     1     0
	     1     1     1     1
	     1     0     1     1
	     1     1     0     1
	     1     0     1     1

% Here is the data (incl. noise) with missing entries zeroed out
info.Data

ans is a tensor of size 5 x 4 x 3
	ans(:,:,1) = 
	    0.0701   -0.0140   -0.0197         0
	   -0.0250    0.0174    0.0090    0.0055
	   -0.0100   -0.0118    0.0130    0.0182
	   -0.0267   -0.0170    0.0305         0
	    0.0729   -0.0061         0         0
	ans(:,:,2) = 
	    0.4143    0.0336   -0.4037         0
	   -0.0852   -0.0181         0         0
	   -0.2301    0.0085         0   -0.0270
	   -0.4196    0.0960         0         0
	    0.5840   -0.0401         0    0.0903
	ans(:,:,3) = 
	   -0.1069    0.0860   -0.0161         0
	    0.0445   -0.0541    0.0299   -0.0440
	    0.0237         0    0.0079   -0.0278
	   -0.1259    0.1210         0    0.0169
	   -0.0465         0    0.0309   -0.0240

Creating sparse missing data problems.

If Sparse_M is set to true, then the data returned is sparse. Moreover, the dense versions are never explicitly created. This option only works when M >= 0.8.

% Specify 80% missing data and sparse
info = create_problem('Size', [5 4 3], 'M', 0.80, 'Sparse_M', true);

% Here is the pattern of known data
info.Pattern

ans is a sparse tensor of size 5 x 4 x 3 with 12 values
	(1,4,2)     1
	(2,1,2)     1
	(2,2,3)     1
	(2,4,3)     1
	(3,1,1)     1
	(3,3,3)     1
	(3,4,2)     1
	(4,1,1)     1
	(4,2,1)     1
	(4,2,3)     1
	(5,1,2)     1
	(5,4,2)     1

% Here is the data (incl. noise) with missing entries zeroed out
info.Data

ans is a sparse tensor of size 5 x 4 x 3 with 12 values
	(1,4,2)   -0.0137
	(2,1,2)   -0.6286
	(2,2,3)   -0.2961
	(2,4,3)    0.1887
	(3,1,1)   -0.2856
	(3,3,3)   -1.3309
	(3,4,2)   -0.1728
	(4,1,1)   -0.0357
	(4,2,1)   -0.0268
	(4,2,3)   -0.3739
	(5,1,2)   -0.3906
	(5,4,2)    0.0938

Create missing data problems with a pre-specified pattern

It's also possible to provide a specific pattern (dense or sparse) to be used to specify where data should be missing.

% Create pattern
P = tenrand([5 4 3]) > 0.5;
% Create test problem with that pattern
info = create_problem('Size', size(P), 'M', P);
% Show the data
info.Data

ans is a tensor of size 5 x 4 x 3
	ans(:,:,1) = 
	         0   -0.6323         0         0
	    0.1566         0   -0.4187         0
	    0.0044         0         0         0
	    0.0508   -0.7211    0.1713         0
	         0         0         0         0
	ans(:,:,2) = 
	         0         0         0   -0.0151
	   -0.0909         0    0.0607         0
	    0.0084         0         0         0
	         0   -0.7582         0         0
	   -0.0734    0.1987         0         0
	ans(:,:,3) = 
	   -0.1618   -0.3415    0.5567    0.4957
	    0.1608         0   -0.5744   -0.4850
	   -0.0797         0         0    0.1821
	         0         0   -0.1827         0
	         0         0         0         0

Creating sparse problems (CP only)

If we assume each model parameter is the input to a Poisson process, then we can generate a sparse test problems. This requires that all the factor matrices and lambda be nonnegative. The default factor generator ('randn') won't work since it produces both positive and negative values.

% Generate factor matrices with a few large entries in each column; this
% will be the basis of our soln.
sz = [20 15 10];
nf = 4;
A = cell(3,1);
for n = 1:length(sz)
    A{n} = rand(sz(n), nf);
    for r = 1:nf
        p = randperm(sz(n));
        idx = p(1:round(.2*sz(n)));
        A{n}(idx,r) = 10 * A{n}(idx,r);
    end
end
S = ktensor(A);
S = normalize(S,'sort',1);

% Create sparse test problem based on provided solution. The
% 'Sparse_Generation' says how many insertions to make based on the
% provided solution S. The lambda vector of the solution is automatically
% rescaled to match the number of insertions.
info = create_problem('Soln', S, 'Sparse_Generation', 500);
num_nonzeros = nnz(info.Data)
total_insertions = sum(info.Data.vals)
orig_lambda_vs_rescaled = S.lambda ./ info.Soln.lambda

num_nonzeros =

   326


total_insertions =

   500


orig_lambda_vs_rescaled =

   84.4101
   84.4101
   84.4101
   84.4101

Generating an initial guess

The create_guess function creates a random initial guess as a cell array of matrices. Its behavior is very similar to create_problem. A nice option is that you can generate an initial guess that is a pertubation of the solution.

info = create_problem;

% Create an initial guess to go with the problem that is just a 5%
% pertubation of the correct solution.
U = create_guess('Soln', info.Soln, 'Factor_Generator', 'pertubation', ...
    'Pertubation', 0.05);

Creating binary CP test problems with create_problem_binary

The create_problem_binary function generates a sparse binary tensor X of a specified size, along with an underlying low-rank CP model Mtrue (a ktensor) that represents the odds of a 1 in each position. This function is specifically designed for creating test problems for CP decomposition of binary data.

Key parameters for create_problem_binary(sz,r,'param',value):

sz: Size of the tensor (e.g., [I J K]).
r: Rank of the underlying CP model Mtrue.
'loprob': Probability of a 'noise' 1 (default: 0.01). This influences the baseline odds of observing a 1.
'hiprob': Probability of a 'structural' 1 (default: 0.90). This influences the odds of observing a 1 for entries corresponding to high-valued elements in the factor matrices.
'density': Density of structural (high-valued) entries in the factor matrices (default: 1/r).
'state': State for the random number generator, for reproducibility.
'spgen': If true, generates the sparse tensor X without explicitly forming the full odds tensor (default: false). This is efficient for very large, sparse problems.

The function returns the generated sparse binary tensor X, the true underlying odds model Mtrue as a ktensor, and an info struct containing the parameters used.

% Here, we generate a 3-way binary tensor of size 20x25x30 with an underlying
% rank-3 CP model.

[X_bin, Mtrue_bin, info_bin] = create_problem_binary([5 8 10], 3);

% Display the generated ktensor representing the odds
Mtrue_bin

% Display the generated sparse binary tensor
X_bin

% Show the parameters used
info_bin.params

% Verify that the data is binary and sparse
is_data_binary = all(ismember(X_bin.vals, 1)) && issparse(X_bin.vals)
nnz_X_bin = nnz(X_bin)

Creating random problem instance
Mtrue_bin is a ktensor of size 5 x 8 x 10
	Mtrue_bin.lambda = 
		     1     1     1
	Mtrue_bin.U{1} = 
		         0    2.3523    0.2162
		         0    2.0245    0.2162
		         0         0    0.2162
		    1.0428         0    0.2162
		         0    1.8234    0.2162
	Mtrue_bin.U{2} = 
		         0         0    0.2162
		         0         0    0.2162
		         0         0    0.2162
		         0         0    0.2162
		         0         0    0.2162
		         0         0    0.2162
		    1.9045         0    0.2162
		    2.5549         0    0.2162
	Mtrue_bin.U{3} = 
		    2.2904         0    0.2162
		         0         0    0.2162
		    2.5425         0    0.2162
		         0         0    0.2162
		         0    2.0318    0.2162
		         0         0    0.2162
		         0    1.6400    0.2162
		         0    2.4658    0.2162
		         0    2.4486    0.2162
		         0         0    0.2162
X_bin is a sparse tensor of size 5 x 8 x 10 with 12 values
	(1,6,1)     1
	(4,7,1)     1
	(4,8,1)     1
	(4,2,2)     1
	(4,8,3)     1
	(5,7,4)     1
	(4,3,6)     1
	(1,1,8)     1
	(1,5,8)     1
	(3,7,8)     1
	(5,8,8)     1
	(5,2,9)     1

ans = 

  struct with fields:

      density: []
       hiprob: 0.9000
       loprob: 0.0100
        Mtrue: []
        spgen: 0
        state: [625×1 uint32]
    verbosity: 1


is_data_binary =

  logical

   0


nnz_X_bin =

    12