. Refer to the paper for details of the algorithm.
The files
Once you have downloaded and unzipped the code, you should have five files:
- ssgpr_ui.m - Main function you should be calling. It handles the initialization of hyperparameters as described in the paper.
- ssgprfixed_ui.m - Same for fixed spectral points.
- ssgpr.m - Code for SSGP regression. It provides training Negative Log Marginal Likelihood (NLML), derivatives with respect to the hyperparameters and predictive means and variances for test data.
- ssgprfixed.m - Same for fixed spectral points.
- minimize.m - Generic multivariate function optimization (written by Carl E. Rasmussen).
Using ssgprfixed_ui.m to perform regression
Syntax
[NMSE, mu, S2, NMLP, loghyper, convergence] = ssgprfixed_ui(x_tr, y_tr, x_tst, y_tst, m, iteropt, loghyper)
- The first five parameters are described in the simple usage tutorial.
iteropt is used to specify the number of optimization iterations.
When positive, it is the maximum number of line searches, when negative,
its absolute value is the maximum number of function evaluations.
See minimize.m help for details.
The default value for iteropt is -1000. This might be unnecessarily high for
most problems, but ensures proper convergence. Set iteropt to zero to avoid optimization and obtain results with the
initial hyperparameters.
The output convergence is a vector providing the values of the objective
function at each step of the optimization. Plotting this output can help you
assess whether convergence is being reached.
-
loghyper is used to specify the initial hyperparameters. It is a column
vector. The hyperparameters needed to completely define the model are the D
lengthscales, the signal power, the noise power and the m spectral points
(being each of them a vector of D real values; one for each scalar frequency
corresponding to each dimension).
For maximum flexibility, two options are considered:
-
You can provide a D+2 long column vector with
- The log of the D lengthscales.
- The log of the square root of the signal power.
- The log of the square root of the noise power.
In this case the spectral points are selected randomly, from the distribution
that corresponds to a squared exponential covariance function (which is also
a squared exponential).
- You can provide a D+2+mxD long column vector, containing all of
the above, and additionally all the spectral points (first the first dimension
of all spectral points, in order, then the second dimension of all spectral
points in the same order, and so on).
Note that the second option allows you to use covariance functions other than
the default squared exponential.
The values of the hyperparameters after model selection are returned by the
function, so you can save the loghyper vector for future use.
Returns:
- NMSE, NMLP: Error measures for test data. (Normalized Mean Square Error and
Negative Mean Log Probability).
- mu, S2: Predictive mean and variance under current hyperparameters at test points.
- loghyper: Hyperparameters selected by optimization.
- convergence: Training NLML values along optimization.
Examples
-
Train a model once, then use it on two different test datasets:
[NMSE, mu_1, S2_1, NMLP, loghyper] = ssgprfixed_ui(X_tr, T_tr, X_tst_1, T_tst_1, 50);
[NMSE, mu_2, S2_2, NMLP] = ssgprfixed_ui(X_tr, T_tr, X_tst_2, T_tst_2, 50, 0, loghyper);
Notice how we have used the hyperparameters found with the first call
for a second dataset without repeating model selection.
-
Use a different covariance function:
Assume you have 50 spectral points drawn from the distribution corresponding to some desired covariance function.
They are stored in w (one per row) and your initial guess for all the lengthscales, signal power and noise power
is 1. Then you can learn the model and make predictions as follows:
loghyper = [zeros(D+2,1); w(:)];
[NMSE, mu] = ssgprfixed_ui(X_tr, T_tr, X_tst, T_tst, 50,-1000,loghyper);
As you see, you can do almost anything you want with the flexibility provided by
this interface.
Using ssgpr_ui.m to perform regression
Syntax
[NMSE, mu, S2, NMLP, loghyper, convergence] = ssgpr_ui(x_tr, y_tr, x_tst, y_tst, m, iteropt, loghyper)
- Behaviour is as described in the previous section, but spectral points are also learned.
One step further: Calling directly the main SSGP code
If you are writing your own specialized code, you may want to directly access the core
code, ssgprfixed (or equivalently, ssgpr). In most cases, though, you will want to avoid this, and use the
convenient interface provided by ssgprfixed_ui.
Syntax (for training)
[NLML, deriv_NLML] = ssgprfixed(hyper, x_tr, y_tr)
- x_tr and y_tr are the training data, one sample per column, input and output
respectively.
- hyper is the set of hyperparameters of the model:
- the log of the D lengthscales,
- the log of the square root of the signal power and
- the log of the square root of the noise power.
- the spectral points in the column format described before.
Returns:
- NLML: Negative Log Marginal Likelihood of the training data under current
hyperparameters.
- deriv_NLML: Vector of the same size as hyper, with the partial derivatives of the NLML wrt each hyperparameter. Values corresponding to spectral points will be zero when using ssgprfixed
.
Syntax (for prediction)
[mu, S2] = ssgprfixed(hyper, x_tr, y_tr, x_tst)
- The first three parameters are as before.
- x_tst is a matrix with the samples for which
prediction is required (one per row).
Returns:
- mu: Predictive mean under current hyperparameters at test points.
- S2: Predictive variance under current hyperparameters at test points.