Using fminunc() with very large scale problems

I have delved into this topic before http://www.mathworks.com/matlabcentral/newsreader/view_thread/300661#811585, but am returning to it again.
I'm working with another class of algorithms that are very large scale. The class of algorithms is known as multi-frame super-resolution. A good reference is "Fast and Robust Multiframe Super Resolution" by S. Farsiu, et al, in IEEE Transactions on Image Processing, Vol 13, No. 10, 1327-1344 (2004).
My question is in regards to Matlab's ability to perform optimization on truly large scale problems. The authors of the aforementioned paper used steepest descent. I've coded up the algorithm with conjugate gradient (Polak-Ribiere).
I would be curious to try the trust-region algorithm of fminunc() on this problem. However, the size of the low resolution images I am using for super-resolution reconstruction is 135x90 (very small for real-world applications, but fine for testing). With 4X super-resolution the resultant image I'm reconstructing is 540x360. The algorithm considers every pixel of the output image to be a variable to be optimized. Thus there are 194,400 variables.
If I blindly use fminunc (letting it calculate the Hessian), it throws a memory error at a line in fminunc that reads
Hstr = sparse(ones(n));
n is apparently the number of variables, as in my case it is 194400.
Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.
Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function. It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc). This is the second class of algorithms I've worked with in the last couple years that really needs an algorithm that can use first order derivatives but does not require second order derivatives.
I did put in a service request in 2011 to add the conjugate gradient algorithm to the optimization toolbox (or another algorithm that might provide comparable functionality). I pointed out a quote by Roger Fletcher in his "Practical Methods of Optimization", 2nd Edition (page 85): “Thus conjugate gradient methods may be the only methods which are applicable to large problems , that is problems with hundreds or thousands of variables.” The emphasis on “large problems” is Fletcher’s, not mine. I'm guessing my request didn't go very far.
I still wonder if there is some built-in function which will solve these problems and I'm just overlooking it. So my question is: Is there a Matlab-provided function that allows for optimization problems of this scale?
Thanks, Eric

1 commentaire

Eric
Eric le 4 Oct 2013
Modifié(e) : Eric le 4 Oct 2013
Here's the last email I received from my service request to add conjugate gradient. It tails off because of a 1500 character limit in what the Mathworks website will display (I think).
I discussed this issue with the development team. They had following comments:
The large scale FMINUNC algorithm uses a conjugate gradient method to determine the step that will be taken in each iteration of the algorithm. As you had pointed out, some of the implementations like Fletcher-Reeves or Polak-Ribiere algorithms, do not require estimation of the Hessian. However, these algorithms have a downside too in that sometimes they are less efficient and robust than other modern algorithms.
With regards to the large scale FMINUNC algorithm that is currently used in MATLAB, to calculate the 2-D subspace for the stepsize calculation, some information about the Hessian is required. This can be either an estimate of the Hessian itself or a Hessian-vector product. Hessian-vector products can be used if the Hessian has some structure and in these cases the complete Hessian does not need to be calculated. We have an example in the documentation that illustrates this (note that this example uses FMINCON, but it can also be applied to FMINUNC)
If it is not possible to specify Hessian-vector products, you can use the medium scale algorithm in FMINUNC. This algorithm builds up an approximation of the Hessian using a formula that just uses gradient information (i.e. the Hessian is not calculated directly). However, if you are solving large problem...

Connectez-vous pour commenter.

 Réponse acceptée

Matt J
Matt J le 4 Oct 2013
Modifié(e) : Matt J le 5 Oct 2013

0 votes

Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.
You must also set the 'Hessian' option parameter to 'on' in order for HessMult to be used. However, I recommend you try to find a HessMult function corresponding to a better approximation to the Hessian than eye(N). Otherwise, it's probably just going to boil down to steepest descent.
It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc).
No. That's just not the case. As a further example, note that fminunc's quasi-Newton method with HessUpdate='steepdesc' is equivalent to steepest descent, which is a first order algorithm. It's not that these options are unavailable. They are simply not recommended.
Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function
If you're saying conjugate gradient works well for you without preconditioning , then that's good news, but it means you're lucky. You don't have a very hard problem to solve. If you are using pre-conditioning, then you are effectively using 2nd order derivative information (to some approximation). You just aren't using it in precisely the same way fminunc uses it.

8 commentaires

Eric
Eric le 7 Oct 2013
Success! Thanks - this is what I was looking for. The other thing I had to do was to modify my objective function to return an empty array for the third output. I haven't determined if this fminunc() works better/faster than conjugate gradient yet, but it's nice to have options in optimization routines.
Thanks again,
Eric
Eric le 22 Oct 2013
In case other people find this question and are working similar problems:
You should check out the Poblano toolbox from Sandia National Laboratories at
This toolbox provides solvers that use only gradient information and are, for my problem, twice as fast as fminunc (i.e., they reduce the objective function value to the same value as fminunc but in half the time). These are exactly the kinds of algorithms that I would have thought the Mathworks would have included in the Optimization Toolbox.
-Eric
Matt J
Matt J le 22 Oct 2013
Modifié(e) : Matt J le 22 Oct 2013
This toolbox provides solvers that ... are, for my problem, twice as fast as fminunc
All of them? And with what HessMult for fminunc?
Eric
Eric le 23 Oct 2013
Modifié(e) : Eric le 23 Oct 2013
Here are a couple tables I put together for typical runs. Some notes:
  • The top function (ConjugateGradient) is my own implementation of the Polak-Ribiere CG algorithm using a linesearch algorithm from Fletcher's "Practical Methods of Optimization" text.
  • The bottom three are functions from Poblano - one of which failed for some reason.
  • I set HessMult to '@(H,Y)Y' and Hessian to 'On' for fminunc().
  • The optimization had 194,400 variables.
  • My own ConjugateGradient() function does not keep track of the number of function evaluations.
  • My ConjugateGradient() algorithm and Poblano's ncg() with Polak-Ribiere differ in which linesearch algorithm is used.
For the first run I was optimizing a super-resolution reconstruction where the image shifts had errors. Only the pixel values are optimized, not the lateral shifts. Hence the final reconstruction is rather poor (the lateral shifts need to be optimized as well to get a decent reconstruction). Here are those results:
Next are some results where the lateral shifts have no error. In this case the final reconstructions are decent. I think the proper way to say this is that the curvature of the objective function value is greater in this case.
In the first case ncg and lbfgs from Poblano both converge to a considerably better answer in a comparable time to fminunc(). In the second case Poblano's lbfgs() still really performs well. fminunc() still really struggles.
Perhaps if I were more knowledgeable in this area I would know how to approximate the Hessian in fminunc() as is done in the BFGS algorithm. If there are other settings for fminunc() or a more intelligent way to use the tools in the Optimization Toolbox, I'd love to try those out.
Best regards,
Eric
Eric
Eric le 23 Oct 2013
Below are results with a loosened linesearch accuracy in my own ConjugateGradient routine:
  • Ending ObjFn Value: 3.8886E+05
  • Time: 32.16
  • # Iterations: 20
  • Fn Evals: 94
So with the conjugate gradient routine I get about the same reduction in objective function value as fminunc() but in 32 seconds rather than 55.
-Eric
Matt J
Matt J le 25 Oct 2013
Modifié(e) : Matt J le 25 Oct 2013
I set HessMult to '@(H,Y)Y' and Hessian to 'On' for fminunc().
This is not really a fair race. There are surely much better choices of HessMult that you could use. As I told you before, '@(H,Y)Y' probably makes the trust region algorithm reduce to something like steepest descent. Everyone knows that there is almost nothing slower than steepest descent.
Eric
Eric le 25 Oct 2013
I agree it's not really a fair race, but from my point of view I don't care. I'm sure there are much better choices for HessMult. But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.
It still seems that the tools in the Optimization Toolbox are a bit lacking. If I don't have information on second order gradient information I'm forced to use fminunc() in this manner for large-scale problems. The code Matlab provides in the Optimization Toolbox is significantly outperformed by the freeware toolbox available from Sandia.
From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense. Other algorithms are not included since they are not the "best". Hence algorithms such as conjugate gradient and BFGS are not included.
Other tools such as Python and Mathematica take a different approach. They include a wide range of optimization algorithms and rely on the user to pick the best for their problem. I tend to prefer that approach.
Best regards,
Eric
Matt J
Matt J le 25 Oct 2013
Modifié(e) : Matt J le 25 Oct 2013
But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.
Since you're computing your first derivatives, you should be able to compute the second derivative by differentiating one more time! For large problems like yours, a common approximation is to compute the Hessian diagonal D only and approximate as
HessMult=@(D,Y) D.*Y
It's not guaranteed to work for all problems, but it does for many, and shouldn't be worse than @(D,Y) Y.
From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense... Hence algorithms such as conjugate gradient and BFGS are not included.
fminunc's quasi-newton method does have a BFGS and DFP option. I wouldn't disagree that CG would be a worthy addition to the toolbox. However, quasi-Newton methods are supposed to perform as least as well as CG (and methods that use second order derivs certainly so), so there is a grain of validity in their claim.

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by