Constrained Optimization: Intuition behind the Lagrangian
From the series: Control Systems in Practice
Brian Douglas
This video introduces a really intuitive way to solve a constrained optimization problem using Lagrange multipliers. We can use them to find the minimum or maximum of a function, J(x), subject to the constraint C(x) = 0.
Published: 22 Aug 2023
In this video, I want to show you a really cool way to solve a constrained optimization problem. That is, a problem where you are trying to find the minimum or maximum of a function, J(x), subject to the constraint C(x) = 0. And the solution involves this equation. This inner portion here is called the Lagrangian and lambda is called the Lagrange multiplier. What is this equation doing and how does it help solve constrained optimization? Well, it’s actually rather intuitive and kind of amazing. So, I hope you stick around for this quick explanation. I’m Brian, and welcome to a MATLAB Tech Talk.
Let’s look at an optimization problem with an objective function, J(x) = x^2, which is plotted on this graph. The question is, what value of X produces the minimum J? Obviously, looking at this graph it’s x = 0, but how can we determine this mathematically?
Well, notice that the slope of the function is nonzero everywhere except at the minimum point where it is flat. In fact, minimum and maximum points must exist where the slope is zero since if the function was increasing or decreasing still then that point wouldn’t be the most extreme. And we can tell if a flat point is a minimum if the rate of change of the slope is positive - or curving up, and a maximum point would be curving down.
Therefore, for J = x^2, we can solve for where the slope is zero by taking the derivative of the function with respect to x and setting it equal to zero. And then we can determine that the rate of change of the slope is positive by looking at the second derivative at this point.
Similarly, we can show that X=0 is the maximum point for J = -x^2. Since the second derivative is negative and therefore the output curves down from this point. And we can show that the flat part of J = x^3 is neither a maximum or minimum point since the second derivative is 0. It’s neither positive or negative.
And this idea of looking at the slope can be expanded to higher order functions. For example, this function: J = x1^2 + x2^2. We can look at this and pretty easily guess that the minimum J occurs when x1 and x2 are both 0. And in fact, this function produces a bowl shape with the lowest point right where we’d expect. And notice that once again, that at this low point the surface is flat and curves up everywhere around it in a positive direction.
Now, to find this point, we once again look at where the derivative is zero. Or more precisely, the gradient is zero. The gradient of J(x) is the partial derivative of the function with respect to both X1 and X2. When we do this, we get the vector 2 x1 and 2 x2. So, it’s a vector field that denotes the direction and magnitude of the steepest slope for every X. For our bowl shaped function, the gradient looks like this. There is zero slope at the bottom of the bowl and gradually becomes steeper as we move towards the edge in all directions.
This is the basis of all optimization problems. We set up an objective function and then solve for - or in a lot of cases search for - locations of zero gradient. I say search for because often objective functions aren’t as straight forward as this and we have to hunt for the optimal point with gradient descent type algorithms and machine learning. These are algorithms that find the optimal point numerically with an iterative approach by descending the gradient. That is, going in the opposite direction of the arrows until you reach a stationary point.
This is probably a good time to briefly bring up the issue of local and global minima. If we’re looking for stationary points using gradient descent, then where that algorithm starts its search becomes very important. If there are multiple basins, then a gradient descent algorithm will get stuck in a local minimum if the starting state is within that basin of attraction. There are different methods for addressing this, including just picking many different starting points and running gradient descent on each and seeing where they all end up.
But what’s nice about the analytical approach of taking the gradient of the function is that each of the stationary points can be identified mathematically by setting the gradient to zero and solving for X. We have three stationary points here corresponding to the local minimum, the global minimum, and the local maximum that occurs between them.
We can now check the value of the objective function at each of these stationary points and then pick the lowest one to find the global minimum.
Alright, so that’s unconstrained optimization - where we are free to choose any value that minimizes our function, but this video is on constrained optimization.
With constrained optimization, the objective function is subject to an additional constraint. In this case, let’s say the constraint is x1 + x2 + 3 must equal 0. That is, we can only choose combinations of x1 and x2 that lie on this line. And if I raise this line up that means that we are really looking for the minimum value along this intersecting parabola.
So, how we do we find this location?
Well, let’s change the surface plot into a contour plot. Imagine that you’re hiking along the black line on a surface that has this contour. At the beginning you’re hiking down pretty steeply as you cross the contour lines quickly and then it gradually gets less steep, and then right here when your path is exactly tangent to a contour line you’ve reached the lowest point and then you start heading back up.
So, the lowest point or the optimal point is where the constraint is parallel to the contour or another way of putting it is where it is perpendicular to the gradient. And this makes sense right? If the gradient is the direction of the steepest travel and you’re walking perpendicularly to that, then you’re not going up or down right at this point - your path is flat.
Now to calculate this point, we can do something clever. We can compare the gradient of the objective function with the gradient of the constraint. If the gradients, or the directions of the steepest slope for the two functions are parallel, then this is a stationary point - and therefore, it’s either a maximum, minimum, or it’s some kind of a saddle point like we had with x^3.
So, the gradient of the constraint is the the partial derivative with respect to both x1 and x2 which is 1 for both. And if I plot that vector field on our graph we get these arrows that are perpendicular to the black line. And notice that at the minimum point, the gradient of J is parallel to the gradient of C.
Now, they’re pointing in different directions, but they are parallel. So, to make them equivalent, we can multiply the constraint by a constant, lambda. Choosing different lambdas is essentially scaling the gradient, you know positively or negatively like I’m showing here. And the idea is that we want to find the Lagrange multiplier that makes the two gradients equal right at this point where they are parallel.
Ok, so now let me rearrange this equation and what we’re left with is the equation that we started with. Now, you’ll notice that if you do this rearranging yourself, it should be a minus lambda C but to make the equation look better we just roll that negative into the constant lambda which can be positive or negative. Alright, the solution to the problem exists where the gradient of the Lagrangian equals 0.
Now again, this doesn’t tell us if we’re at a maximum, minimum, or a saddle point. We know that we need some form of a second derivative to determine that and generally speaking that second derivative comes in the form of a bordered Hessian matrix. I’m not going to cover that here since I just wanted to talk about the Lagrangian itself and in our problem we know this point is a minimum due to our understanding of the functions. But I left some good references for it down below.
Aright, so with our understanding of the Lagrangian, let’s solve this constrained optimization problem. The Lagrangian is x1^2 + x2^2 + lambda * x1 + x2 + 3. And the gradient gives us these three equations which we can use to solve for the three unknowns, X1, X2, and lambda, by setting each of the equations to zero. And we find that the optimal point is at -1.5 and -1.5, and the Lagrange multiplier has a value of 3.
How cool is that! It’s just such a simple idea that we can find minima and maxima of constrained optimization problems just by looking at where the gradients of the two functions are parallel.
Alright now that we know how the Lagrangian plays a role in constrained optimization, I want to wrap up this video by showing you how to solve this example in MATLAB since more often than not you won’t be solving it by hand.
Ok, I have this live script that uses the well-named function optimproblem to solve our optimization problem. Here, I’ve defined the objective function, x1^2 + x2^2 and the constraint, x1 + x2 + 3 = 0. The function solve, solves the problem and it returns several different outputs.
And when it runs, I’m displaying the optimal point, X and the lagrange multiplier, lambda. And over here we see that X is -1.5 and -1.5 and lambda is 3 just as we calculated.
Alright, so that’s where I’m going to leave this video. Hopefully, you have a better understanding of how the Lagrangian plays a role in constrained optimization problems and what the Lagrange multiplier is doing. I’ve left links below to several different resources and MATLAB examples if you’d like to learn more.
Also you can find all of the Tech Talk videos, across many different topics, nicely organized at mathworks.com. And if you liked this video then you might be interested to learn about a popular type of constrained optimization that we use for control problems called the linear quadratic regulator. Alright, thanks for watching, and I’ll see you next time