Check first derivative function against finite-difference approximation

Since R2023b

Syntax

``valid = checkGradients(fun,x0)``
``valid = checkGradients(fun,x0,options)``
``valid = checkGradients(___,Name=Value)``
``[valid,err] = checkGradients(___)``

Description

````valid = checkGradients(fun,x0)` compares the value of the supplied first derivative function in `fun` at a point near `x0` against a finite-difference approximation. By default, the comparison assumes that the function is an objective function. To check constraint functions, set the `IsConstraint` name-value argument to `true`.```

example

````valid = checkGradients(fun,x0,options)` modifies the comparison by changing finite differencing options.```

example

````valid = checkGradients(___,Name=Value)` specifies additional options using one or more name-value arguments, in addition to any of the input argument combinations in the previous syntaxes. For example, you can set the tolerance for the comparison, or specify that the comparison is for nonlinear constraint functions.```

example

````[valid,err] = checkGradients(___)` also returns a structure `err` containing the relative differences between the supplied derivatives and finite-difference approximations.```

example

Examples

collapse all

The `rosen` function at the end of this example computes the Rosenbrock objective function and its gradient for a 2-D variable `x`.

Check that the computed gradient in `rosen` matches a finite-difference approximation near the point [2,4].

```x0 = [2,4]; valid = checkGradients(@rosen,x0)```
```valid = logical 1 ```
```function [f,g] = rosen(x) f = 100*(x(1) - x(2)^2)^2 + (1 - x(2))^2; if nargout > 1 g(1) = 200*(x(1) - x(2)^2); g(2) = -400*x(2)*(x(1) - x(2)^2) - 2*(1 - x(2)); end end```

The `vecrosen` function at the end of this example computes the Rosenbrock objective function in least-squares form and its Jacobian (gradient).

Check that the computed gradient in `vecrosen` matches a finite-difference approximation near the point [2,4].

```x0 = [2,4]; valid = checkGradients(@vecrosen,x0)```
```valid = logical 1 ```
```function [f,g] = vecrosen(x) f = [10*(x(1) - x(2)^2),1-x(1)]; if nargout > 1 g = zeros(2); % Allocate g g(1,1) = 10; % df(1)/dx(1) g(1,2) = -20*x(2); % df(1)/dx(2) g(2,1) = -1; % df(2)/dx(1) g(2,2) = 0; % df(2)/dx(2) end end```

The `rosen` function at the end of this example computes the Rosenbrock objective function and its gradient for a 2-D variable `x`.

For some initial points, the default forward finite differences cause `checkGradients` to mistakenly indicate that the `rosen` function has incorrect gradients. To see result details, set the `Display` option to `"on"`.

```x0 = [0,0]; valid = checkGradients(@rosen,x0,Display="on")```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.48826e-06. Supplied derivative element (1,1): -0.126021 Finite-difference derivative element (1,1): -0.126023 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ ```
```valid = logical 0 ```

`checkGradients` reports a mismatch, with a difference of just over 1 in the sixth decimal place. Use central finite differences and check again.

```opts = optimoptions("fmincon",FiniteDifferenceType="central"); valid = checkGradients(@rosen,x0,opts,Display="on")```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.29339e-11. checkGradients successfully passed. ____________________________________________________________ ```
```valid = logical 1 ```

Central finite differences are generally more accurate. `checkGradients` reports that the gradient and central finite-difference approximation match to about 11 decimal places.

```function [f,g] = rosen(x) f = 100*(x(1) - x(2)^2)^2 + (1 - x(2))^2; if nargout > 1 g(1) = 200*(x(1) - x(2)^2); g(2) = -400*x(2)*(x(1) - x(2)^2) - 2*(1 - x(2)); end end```

The `tiltellipse` function at the end of this example imposes the constraint that the 2-D variable `x` is confined to the interior of the tilted ellipse

$\frac{xy}{2}+\left(x+2{\right)}^{2}+\frac{\left(y-2{\right)}^{2}}{2}\le 2$.

Visualize the ellipse.

```f = @(x,y) x.*y/2+(x+2).^2+(y-2).^2/2-2; fcontour(f,LevelList=0) axis([-6 0 -1 7])```

Check the gradient of this nonlinear inequality constraint function.

```x0 = [-2,6]; valid = checkGradients(@tiltellipse,x0,IsConstraint=true)```
```valid = 1x2 logical array 1 1 ```
```function [c,ceq,gc,gceq] = tiltellipse(x) c = x(1)*x(2)/2 + (x(1) + 2)^2 + (x(2)- 2)^2/2 - 2; ceq = []; if nargout > 2 gc = [x(2)/2 + 2*(x(1) + 2); x(1)/2 + x(2) - 2]; gceq = []; end end```

The `fungrad` function at the end of this example correctly calculates the gradient of some components of the least-squares objective, and incorrectly calculates others.

Examine the second output of `checkGradients` to see which components do not match well at the point [2,4]. To see result details, set the `Display` option to `"on"`.

```x0 = [2,4]; [valid,err] = checkGradients(@fungrad,x0,Display="on")```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.749797. Supplied derivative element (3,2): 19.9838 Finite-difference derivative element (3,2): 5 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ ```
```valid = logical 0 ```
```err = struct with fields: Objective: [3x2 double] ```

The output shows that element [3,2] is incorrect. But is that the only problem? Examine `err.Objective` and look for entries that are far from 0.

`err.Objective`
```ans = 3×2 0.0000 0.0000 0.0000 0 0.5000 0.7498 ```

Both the [3,1] and [3,2] elements of the derivative are incorrect. The `fungrad2` function at the end of this example corrects the errors.

`[valid,err] = checkGradients(@fungrad2,x0,Display="on")`
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 2.2338e-08. checkGradients successfully passed. ____________________________________________________________ ```
```valid = logical 1 ```
```err = struct with fields: Objective: [3x2 double] ```
`err.Objective`
```ans = 3×2 10-7 × 0.2234 0.0509 0.0003 0 0.0981 0.0042 ```

All the differences between the gradient and finite-difference approximations are less than 1e-7 in magnitude.

This code creates the `fungrad` helper function.

```function [f,g] = fungrad(x) f = [10*(x(1) - x(2)^2),1 - x(1),5*(x(2) - x(1)^2)]; if nargout > 1 g = zeros(3,2); g(1,1) = 10; g(1,2) = -20*x(2); g(2,1) = -1; g(3,1) = -20*x(1); g(3,2) = 5*x(2); end end```

This code creates the `fungrad2` helper function.

```function [f,g] = fungrad2(x) f = [10*(x(1) - x(2)^2),1 - x(1),5*(x(2) - x(1)^2)]; if nargout > 1 g = zeros(3,2); g(1,1) = 10; g(1,2) = -20*x(2); g(2,1) = -1; g(3,1) = -10*x(1); g(3,2) = 5; end end```

Input Arguments

collapse all

Function to check, specified as a function handle.

• If `fun` represents an objective function, then `fun` must have the following signature.

`[fval,grad] = fun(x)`

`checkGradients` compares the value of `grad(x)` to a finite-difference approximation for a point `x` near `x0`. The comparison is

where `grad` represents the value of the gradient function, and `grad_fd` represents the value of the finite-difference approximation. `checkGradients` performs this division component-wise.

• If `fun` represents a least-squares objective, then `fun` is a vector, and `grad(x)` is a matrix representing the Jacobian of `fun`.

If `fun` returns an array of `m` components and `x` has `n` elements, where `n` is the number of elements of `x0`, the Jacobian `J` is an `m`-by-`n` matrix where `J(i,j)` is the partial derivative of `F(i)` with respect to `x(j)`. (The Jacobian `J` is the transpose of the gradient of `F`.)

• If `fun` represents a nonlinear constraint, then `fun` must have the following signature.

`[c,ceq,gc,gceq] = fun(x)`
• `c` represents the nonlinear inequality constraints. Solvers attempt to achieve `c <= 0`. The `c` output can be a vector of any length.

• `ceq` represents the nonlinear equality constraints. Solvers attempt to achieve `ceq = 0`. The `ceq` output can be a vector of any length.

• `gc` represents the gradient of the nonlinear inequality constraints.

• `gceq` represents the gradient of the nonlinear equality constraints.

Data Types: `function_handle`

Location at which to check the gradient, specified as a double array for all solvers except `lsqcurvefit`. For `lsqcurvefit`, `x0` is a 1-by-2 cell array `{x0array,xdata}`.

`checkGradients` checks the gradient at a point near the specified `x0`. The function adds a small random direction to `x0`, no more than `1e-3` in absolute value. This perturbation attempts to protect the check against a point where an incorrect gradient function might pass because of cancellations.

Example: `randn(5,1)`

Data Types: `double`
Complex Number Support: Yes

Finite differencing options, specified as the output of `optimoptions`. The following options affect finite differencing.

OptionDescription
`FiniteDifferenceStepSize`

Scalar or vector step size factor for finite differences. When you set `FiniteDifferenceStepSize` to a vector `v`, the forward finite differences `delta` are

`delta = v.*sign′(x).*max(abs(x),TypicalX);`

where `sign′(x) = sign(x)` except `sign′(0) = 1`. Central finite differences are

`delta = v.*max(abs(x),TypicalX);`

A scalar `FiniteDifferenceStepSize` expands to a vector. The default is `sqrt(eps)` for forward finite differences, and `eps^(1/3)` for central finite differences.

`FiniteDifferenceType`

Finite differences used to estimate gradients are either `"forward"` (default), or `"central"` (centered). `"central"` takes twice as many function evaluations but is usually more accurate.

`TypicalX`

Typical `x` values. The number of elements in `TypicalX` is equal to the number of elements in the starting point `x0`. The default value is `ones(numberofvariables,1)`.

DiffMaxChange (discouraged)

Maximum change in variables for finite-difference gradients (a positive scalar). The default is `Inf`.

DiffMinChange (discouraged)

Minimum change in variables for finite-difference gradients (a nonnegative scalar). The default is `0`.

Example: `optimoptions("fmincon",FiniteDifferenceStepSize=1e-4)`

Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `IsConstraint=true,Tolerance=5e-4`

Flag to display results at command line, specified as `"off"` (do not display the results) or `"on"` (display the results).

Example: `"off"`

Data Types: `char` | `string`

Flag to check nonlinear constraint gradients, specified as `false` (the function is an objective function) or `true` (the function is a nonlinear constraint function).

Example: `true`

Data Types: `logical`

Tolerance for the gradient approximation, specified as a nonnegative scalar. The returned value `valid` is `true` for each component where the absolute relative difference between the gradient of `fun` and its finite-difference approximation is less than or equal to `Tolerance`.

Example: `1e-3`

Data Types: `double`

Output Arguments

collapse all

Indication that the finite-difference approximation matches the gradient, returned as a logical scalar for objective functions or a two-element logical vector for nonlinear constraint functions `[c,ceq]`. The returned value `valid` is `true` when the absolute relative difference between the gradient of `fun` and its finite-difference approximation is less than or equal to `Tolerance` for all components of the gradient. Otherwise, `valid` is `false`.

When a nonlinear constraint `c` or `ceq` is empty, the returned value of `valid` for that constraint is `true`.

Relative differences between the gradients and finite-difference approximations, returned as a structure. For objective functions, the field name is `Objective`. For nonlinear constraint functions, the field names are `Inequality` (corresponding to `c`) and `Equality` (corresponding to `ceq`). Each component of `err` has the same shape as the supplied derivatives from `fun`.

collapse all

Gradients or Jacobians estimated near the initial point did not match the supplied derivatives to within a default tolerance of `1e-6` or, for the `checkGradients` function, the specified `Tolerance` value.

• Usually, this failure means that your objective or nonlinear constraint functions have an incorrect derivative calculation. Double-check the indicated derivative.

• Occasionally, the finite difference approximations to the derivatives are inaccurate enough to cause the failure. This inaccuracy can occur when the second derivative of a function (objective or nonlinear constraint) has a large magnitude. It can also occur when using the default `'forward'` finite differences, which are less accurate but faster than `'central'` finite differences. If you think that the derivative functions are correct, try one or both of the following to see if `CheckGradients` passes:

• Set the `FiniteDifferenceType` option to `'central'`.

• Set the `FiniteDifferenceStepSize` option to a small value such as `1e-10`.

• Derivatives are checked at a random point near the initial point. Therefore, the gradient check can pass or fail randomly when differing nearby points have differing check results.

For details, see Checking Validity of Gradients or Jacobians.

Version History

Introduced in R2023b