# templateKernel

Kernel model template

## Syntax

``t = templateKernel()``
``t = templateKernel(Name,Value)``

## Description

`templateKernel` creates a template suitable for fitting a Gaussian kernel classification model for nonlinear classification.

The template specifies the binary learner model, number of dimensions of expanded space, kernel scale, box constraint, and regularization strength, among other parameters. After creating the template, train the model by passing the template and data to `fitcecoc`.

example

````t = templateKernel()` returns a kernel model template.If you create a default template, then the software uses default values for all input arguments during training.```

example

````t = templateKernel(Name,Value)` returns a template with additional options specified by one or more name-value pair arguments. For example, you can implement logistic regression or specify the number of dimensions of the expanded space.If you display `t` in the Command Window, then some properties of `t` appear empty (`[]`). During training, the software uses default values for the empty properties.```

## Examples

collapse all

Create a default kernel model template and use it to train an error-correcting output codes (ECOC) multiclass model.

`load fisheriris`

Create a default kernel model template.

`t = templateKernel()`
```t = Fit template for classification Kernel. BetaTolerance: [] BlockSize: [] BoxConstraint: [] Epsilon: [] NumExpansionDimensions: [] GradientTolerance: [] HessianHistorySize: [] IterationLimit: [] KernelScale: [] Lambda: [] Learner: 'svm' LossFunction: [] Stream: [] VerbosityLevel: [] Version: 1 Method: 'Kernel' Type: 'classification' ```

During training, the software fills in the empty properties with their respective default values.

Specify `t` as a binary learner for an ECOC multiclass model.

`Mdl = fitcecoc(meas,species,'Learners',t)`
```Mdl = CompactClassificationECOC ResponseName: 'Y' ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' BinaryLearners: {3x1 cell} CodingMatrix: [3x3 double] Properties, Methods ```

`Mdl` is a `CompactClassificationECOC` multiclass classifier.

Create a kernel model template with additional options to implement logistic regression with a kernel scale parameter selected by a heuristic procedure.

`t = templateKernel('Learner','logistic','KernelScale','auto')`
```t = Fit template for classification Kernel. BetaTolerance: [] BlockSize: [] BoxConstraint: [] Epsilon: [] NumExpansionDimensions: [] GradientTolerance: [] HessianHistorySize: [] IterationLimit: [] KernelScale: 'auto' Lambda: [] Learner: 'logistic' LossFunction: [] Stream: [] VerbosityLevel: [] Version: 1 Method: 'Kernel' Type: 'classification' ```

## Input Arguments

collapse all

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'Learner','logistic','NumExpansionDimensions',2^15,'KernelScale','auto'` specifies to implement logistic regression after mapping the predictor data to the `2^15` dimensional space using feature expansion with a kernel scale parameter selected by a heuristic procedure.

Kernel Classification Options

collapse all

Linear classification model type, specified as the comma-separated pair consisting of `'Learner'` and `'svm'` or `'logistic'`.

In the following table, $f\left(x\right)=T\left(x\right)\beta +b.$

• x is an observation (row vector) from p predictor variables.

• $T\left(·\right)$ is a transformation of an observation (row vector) for feature expansion. T(x) maps x in ${ℝ}^{p}$ to a high-dimensional space (${ℝ}^{m}$).

• β is a vector of coefficients.

• b is the scalar bias.

ValueAlgorithmResponse RangeLoss Function
`'svm'`Support vector machiney ∊ {–1,1}; 1 for the positive class and –1 otherwiseHinge: $\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,1-yf\left(x\right)\right]$
`'logistic'`Logistic regressionSame as `'svm'`Deviance (logistic): $\ell \left[y,f\left(x\right)\right]=\mathrm{log}\left\{1+\mathrm{exp}\left[-yf\left(x\right)\right]\right\}$

Example: `'Learner','logistic'`

Number of dimensions of the expanded space, specified as the comma-separated pair consisting of `'NumExpansionDimensions'` and `'auto'` or a positive integer. For `'auto'`, the `templateKernel` function selects the number of dimensions using `2.^ceil(min(log2(p)+5,15))`, where `p` is the number of predictors.

For details, see Random Feature Expansion.

Example: `'NumExpansionDimensions',2^15`

Data Types: `char` | `string` | `single` | `double`

Kernel scale parameter, specified as the comma-separated pair consisting of `'KernelScale'` and `'auto'` or a positive scalar. The software obtains a random basis for random feature expansion by using the kernel scale parameter. For details, see Random Feature Expansion.

If you specify `'auto'`, then the software selects an appropriate kernel scale parameter using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. Therefore, to reproduce results, set a random number seed by using `rng` before training.

Example: `'KernelScale','auto'`

Data Types: `char` | `string` | `single` | `double`

Box constraint, specified as the comma-separated pair consisting of `'BoxConstraint'` and a positive scalar.

This argument is valid only when `'Learner'` is `'svm'`(default) and you do not specify a value for the regularization term strength `'Lambda'`. You can specify either `'BoxConstraint'` or `'Lambda'` because the box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn), where n is the number of observations.

Example: `'BoxConstraint',100`

Data Types: `single` | `double`

Regularization term strength, specified as the comma-separated pair consisting of `'Lambda'` and `'auto'` or a nonnegative scalar.

For `'auto'`, the value of `Lambda` is 1/n, where n is the number of observations.

When `Learner` is `'svm'`, you can specify either `BoxConstraint` or `Lambda` because the box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn).

Example: `'Lambda',0.01`

Data Types: `char` | `string` | `single` | `double`

Convergence Controls

collapse all

Relative tolerance on the linear coefficients and the bias term (intercept), specified as the comma-separated pair consisting of `'BetaTolerance'` and a nonnegative scalar.

Let ${B}_{t}=\left[{\beta }_{t}{}^{\prime }\text{\hspace{0.17em}}\text{\hspace{0.17em}}{b}_{t}\right]$, that is, the vector of the coefficients and the bias term at optimization iteration t. If ${‖\frac{{B}_{t}-{B}_{t-1}}{{B}_{t}}‖}_{2}<\text{BetaTolerance}$, then optimization terminates.

If you also specify `GradientTolerance`, then optimization terminates when the software satisfies either stopping criterion.

Example: `'BetaTolerance',1e–6`

Data Types: `single` | `double`

Absolute gradient tolerance, specified as the comma-separated pair consisting of `'GradientTolerance'` and a nonnegative scalar.

Let $\nabla {ℒ}_{t}$ be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If ${‖\nabla {ℒ}_{t}‖}_{\infty }=\mathrm{max}|\nabla {ℒ}_{t}|<\text{GradientTolerance}$, then optimization terminates.

If you also specify `BetaTolerance`, then optimization terminates when the software satisfies either stopping criterion.

Example: `'GradientTolerance',1e–5`

Data Types: `single` | `double`

Maximum number of optimization iterations, specified as the comma-separated pair consisting of `'IterationLimit'` and a positive integer.

The default value is 1000 if the transformed data fits in memory, as specified by the `BlockSize` name-value pair argument. Otherwise, the default value is 100.

Example: `'IterationLimit',500`

Data Types: `single` | `double`

Other Kernel Classification Options

collapse all

Maximum amount of allocated memory (in megabytes), specified as the comma-separated pair consisting of `'BlockSize'` and a positive scalar.

If `templateKernel` requires more memory than the value of `'BlockSize'` to hold the transformed predictor data, then the software uses a block-wise strategy. For details about the block-wise strategy, see Algorithms.

Example: `'BlockSize',1e4`

Data Types: `single` | `double`

Random number stream for reproducibility of data transformation, specified as the comma-separated pair consisting of `'RandomStream'` and a random stream object. For details, see Random Feature Expansion.

Use `'RandomStream'` to reproduce the random basis functions that `templateKernel` uses to transform the predictor data to a high-dimensional space. For details, see Managing the Global Stream Using RandStream and Creating and Controlling a Random Number Stream.

Example: `'RandomStream',RandStream('mlfg6331_64')`

Size of the history buffer for Hessian approximation, specified as the comma-separated pair consisting of `'HessianHistorySize'` and a positive integer. At each iteration, `templateKernel` composes the Hessian approximation by using statistics from the latest `HessianHistorySize` iterations.

Example: `'HessianHistorySize',10`

Data Types: `single` | `double`

Verbosity level, specified as the comma-separated pair consisting of `'Verbose'` and either `0` or `1`. `Verbose` controls the display of diagnostic information at the command line.

ValueDescription
`0``templateKernel` does not display diagnostic information.
`1``templateKernel` displays the value of the objective function, gradient magnitude, and other diagnostic information.

Example: `'Verbose',1`

Data Types: `single` | `double`

## Output Arguments

collapse all

Kernel model template, returned as a template object. To train a kernel classification model for multiclass problems, pass `t` to `fitcecoc`.

If you display `t` in the Command Window, then some properties appear empty (`[]`). The software replaces the empty properties with their corresponding default values during training.

collapse all

### Random Feature Expansion

Random feature expansion, such as Random Kitchen Sinks[1] or Fastfood[2], is a scheme to approximate Gaussian kernels of the kernel classification algorithm to use for big data in a computationally efficient way. Random feature expansion is more practical for big data applications that have large training sets, but can also be applied to smaller data sets that fit in memory.

The kernel classification algorithm searches for an optimal hyperplane that separates the data into two classes after mapping features into a high-dimensional space. Nonlinear features that are not linearly separable in a low-dimensional space can be separable in the expanded high-dimensional space. All the calculations for hyperplane classification use only dot products. You can obtain a nonlinear classification model by replacing the dot product x1x2' with the nonlinear kernel function $G\left({x}_{1},{x}_{2}\right)=〈\phi \left({x}_{1}\right),\phi \left({x}_{2}\right)〉$, where xi is the ith observation (row vector) and φ(xi) is a transformation that maps xi to a high-dimensional space (called the “kernel trick”). However, evaluating G(x1,x2) (Gram matrix) for each pair of observations is computationally expensive for a large data set (large n).

The random feature expansion scheme finds a random transformation so that its dot product approximates the Gaussian kernel. That is,

`$G\left({x}_{1},{x}_{2}\right)=〈\phi \left({x}_{1}\right),\phi \left({x}_{2}\right)〉\approx T\left({x}_{1}\right)T\left({x}_{2}\right)\text{'},$`

where T(x) maps x in ${ℝ}^{p}$ to a high-dimensional space (${ℝ}^{m}$). The Random Kitchen Sinks scheme uses the random transformation

`$T\left(x\right)={m}^{-1/2}\mathrm{exp}\left(iZx\text{'}\right)\text{'},$`

where $Z\in {ℝ}^{m×p}$ is a sample drawn from $N\left(0,{\sigma }^{-2}\right)$ and σ2 is a kernel scale. This scheme requires O(mp) computation and storage. The Fastfood scheme introduces another random basis V instead of Z using Hadamard matrices combined with Gaussian scaling matrices. This random basis reduces the computation cost to O(m`log`p) and reduces storage to O(m).

You can specify values for m and σ2 using the `NumExpansionDimensions` and `KernelScale` name-value arguments of `templateKernel`, respectively.

The `templateKernel` function uses the Fastfood scheme for random feature expansion, and uses linear classification to train a Gaussian kernel classification model. Unlike solvers in the `templateSVM` function, which require computation of the n-by-n Gram matrix, the solver in `templateKernel` only needs to form a matrix of size n-by-m, with m typically much less than n for big data.

### Box Constraint

A box constraint is a parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization). Increasing the box constraint can lead to longer training times.

The box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn), where n is the number of observations.

## Algorithms

`templateKernel` minimizes the regularized objective function using a Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) solver with ridge (L2) regularization. To find the type of LBFGS solver used for training, type `FitInfo.Solver` in the Command Window.

• `'LBFGS-fast'` — LBFGS solver.

• `'LBFGS-blockwise'` — LBFGS solver with a block-wise strategy. If `templateKernel` requires more memory than the value of `BlockSize` to hold the transformed predictor data, then it uses a block-wise strategy.

• `'LBFGS-tall'` — LBFGS solver with a block-wise strategy for tall arrays.

When `templateKernel` uses a block-wise strategy, `templateKernel` implements LBFGS by distributing the calculation of the loss and gradient among different parts of the data at each iteration. Also, `templateKernel` refines the initial estimates of the linear coefficients and the bias term by fitting the model locally to parts of the data and combining the coefficients by averaging. If you specify `'Verbose',1`, then `templateKernel` displays diagnostic information for each data pass and stores the information in the `History` field of `FitInfo`.

When `templateKernel` does not use a block-wise strategy, the initial estimates are zeros. If you specify `'Verbose',1`, then `templateKernel` displays diagnostic information for each iteration and stores the information in the `History` field of `FitInfo`.

## References

[1] Rahimi, A., and B. Recht. “Random Features for Large-Scale Kernel Machines.” Advances in Neural Information Processing Systems. Vol. 20, 2008, pp. 1177–1184.

[2] Le, Q., T. Sarlós, and A. Smola. “Fastfood — Approximating Kernel Expansions in Loglinear Time.” Proceedings of the 30th International Conference on Machine Learning. Vol. 28, No. 3, 2013, pp. 244–252.

[3] Huang, P. S., H. Avron, T. N. Sainath, V. Sindhwani, and B. Ramabhadran. “Kernel methods match Deep Neural Networks on TIMIT.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. 2014, pp. 205–209.

## Version History

Introduced in R2018b