Optimization for Unconstrained Functions

Download as
 PPT
Presentation Description 

lecture notes on Optimization Methods for Unconstrained Functions of  More

Views: 332
Like it  ( Likes) Dislike it  ( Dislikes)
Added: August 03, 2008 This Presentation is Public 
Presentation Category : Education All Rights Reserved
Presentation Transcript

Slide 1:Topic 4: Optimization Methods for Unconstrained Functions of Several Variables The unconstrained multi-variable problem is written as min f(x) x  RN where x is a vector of the decision variables.


Slide 2:A contour plot consists of contour lines where each contour line indicates a specific value of the function f(x1,x2). What is a contour plot?


Slide 3:Solution Methods The solution methods are classified into 3 broad categories: Direct (zero order) search methods: Simplex (3.2.1) Gradient based (first order) methods: Steepest descent (3.3.1) Conjugate gradient (3.3.5) Second order methods: Newton (3.3.2) Modified Newton (3.3.3) Quasi-Newton (3.3.6)


Slide 4:DIRECT SEARCH METHODS They require only function values. They are computationally uncomplicated. However, they are slow.


Slide 5:The Simplex Search Method The simplex method in this course is not related to the simplex method in linear programming. A simplex is a triangle in the 2-dimension space and tetrahedron in the 3-dimension space: A simplex in N-dimension space has N+1 vertices.


Slide 6:The method procedure: Set up a regular simplex in the space of the independent variables (i.e., define the vertices). Evaluate the objective function at each vertex. Locate the vertex with the highest function value. Reflect this vertex through the centroid to generate a new vertex which defines a new simplex Repeat until the termination rule is satisfied.


Slide 7:The calculations: Generation of the regular simplex, given a base point and scale factor, Calculation of the reflected point. Given the base point x0, the N vertices of the simplex are given by


Slide 8:The increments 1 and 2 are calculated from: The reflection of the worst vertex is performed by first calculating the centroid. Suppose x(j) is the point to be reflected, the centroid of the remaining N vertices is


Slide 9:All the points on the line from x(j) through xc are given by: Set  = 2 and the desired vertex is given by:


Slide 10:Example Solution


Slide 11:Solve by applying 5 iterations of the simplex method, starting with x0 = [5, 2]T. (5, 2) f = 234 Example


Slide 12:(5, 2) f = 234 (5.51, 4.63) f = 576.31 (6.8, 4.12) f = 851.91 Iteration 1


Slide 13:(5, 2) f = 234 (5.51, 4.63) f = 576.31 (3.63, 2.51) f = 102.88 Iteration 2


Slide 14:(5, 2) f = 234 (3.63, 2.517) f = 102.88 (3.12, -0.1204) f = 64.71 Iteration 3


Slide 15:(3.638, 2.517) f = 102.88 (3.12, -0.1204) f = 64.71 (1.75, 0.397) f = 5.15 Iteration 4


Slide 16:(3.12, -0.12) f = 64.71 (1.24, -2.24) f = 61.877 (1.758, 0.3972) f = 5.51 Iteration 5 The solution


Slide 17:The Improved Simplex Method The performance of the simplex method can be improved by allowing expansion and contraction in the course of the reflection calculation. Define the following points: The reflection step is performed by The normal expansion is when  = 2. But,  should be chosen depending on the outcome of the reflection.


Slide 18:One good strategy will be as follows:


Slide 19:GRADIENT-BASED METHODS They employ the gradient information. They are iterative methods and employ the iteration procedure where (k) : step size s(x(k)): direction The methods differ on how s(x(k)) is computed.


Slide 20:Steepest Descent Method Let x(k) be the current point. The Taylor expansion of the objective function about x(k): We need the next point to have a lower objective function value than the current point: That is equivalent to The smallest value of this product is when


Slide 21:We call this direction the steepest descent direction. Another proof of the steepest descent direction is to recognize that the gradient always points towards increasing value of the objective function. Taking the negative of the gradient, then, leads to the decreasing value of the objective function. Now the direction is determined, a single variable search is needed to determine the value of the step size. In every iteration of the method, the direction and step size are computed.


Slide 22:Example with x(0) = [10, 10]T. Solution Iteration 1: Find the gradient: x(1) = x(0) - (0) f(x(0)) where Find (0) such that f(x(1)) = f((0)) is minimized.


Slide 23:To find the minimizer of f(), we can either find the critical points of it and check which one is the minimizer, or use any single variable optimization technique (see Chapter 2) to estimate the minimizer. I used Excel Solver to find (0), it is equal to 0.0562. x(1) = [-1.2453, 2.1283]T Iteration 2: x(2) = x(1) - (1) f(x(1)) Find (1) such that f(x(2)) = f((1)) is minimized. (1) = 0.1217


Slide 24:x(2) = [0.1438, 0.1438]T Iterations 3: … Stop when the stopping rule is met.


Slide 25:Another way to find the value of (0) is to set the first derivative of f() to zero and solve for :


Slide 26:Notes The good thing about the steepest descent method is that it always converges. What’s bad about it is that it converges slower as the minimum is approached.


Slide 27:The gradient represents the perpendicular line to the tangent of the contour line of the function at a particular point.


Slide 28:The steepest descent method zigzags its way towards the optimum point. This is because each direction is orthogonal to the previous direction.


Slide 29:Conjugate Gradient Method Preliminaries Two vectors u and v are said to be conjugate with respect to matrix C if uT C v = 0. The two vectors are C-conjugate. A set of conjugate vectors is called a conjugate set. The eigenvectors of the matrix are conjugate with respect to it.


Slide 30:If u and v are conjugate and v and w are conjugate, then u and w are conjugate. Conjugate directions, which are vectors, are used to find the minimum of a function. The minimum of a quadratic function of N variables can be found after exactly N searches along conjugate directions.


Slide 31:For example, consider the problem starting from x(0) = [0, 0]T and using the conjugate directions s(1) = [1, 0]T and s(2) = [0.5, 1]T. The optimal solution is [-3/16, -1/8]T. The directions are conjugate with the respect to the Hessian of q(x): s(1)T C s(2) = 0. The minimum of q(x) can be found in 2 steps. First, optimize using s(1) only. Second, optimize using s(2) only.


Slide 34:The question now is, how can we conveniently generate conjugate directions? For a quadratic function f(x), the gradient is given by Take two points x(0) and x(1), the change in the gradient is given by The iteration procedure we will apply is x(k+1) = x(k) + (k) s(x(k)) The search directions are calculated as with s(0)= - g(0).


Slide 35:If the steepest descent direction is used, we know that We want to choose (k-1) such that s(k) is C-conjugate to s(k-1). Take the first direction: s(1) = - g(1) + (0) s(0) = - g(1) - (0) g(0) We require s(0) and s(1) to be C-conjugate: s(1)T C s(0) = 0 [g(1) + (0) g(0)]T C s(0) = 0 We know that


Slide 36:Therefore, From the quadratic property, [g(1) + (0) g(0)]T g = 0 After expansion, g(1)T g(1) + (0) g(0)T g(1) – g(1)T g(0) – (0) g(0)T g(0) = 0 From this,


Slide 37:Therefore, the general iteration is given by for k = 1, …,N-1. If the function is not quadratic, more iterations may be required.


Slide 38:The steepest descent direction is deflected so the minimum is reached directly.


Slide 39:Example See the example and the solution in the book.


Slide 40:Newton’s Method It is a second order method. Let x(k) be the current point. The Taylor expansion of the objective function about x(k): The quadratic approximation of f(x) is We need to find the critical point of the approximation:


Slide 41:The Newton optimization method is If the function f(x) is quadratic, the solution can be found in exactly one step.


Slide 42:Example See the example and the solution in the book.


Slide 43:Modified Newton’s Method Newton method can be unreliable for non-quadratic functions. The Newton step will often be large when x(0) is far from x*. To solve this problem, we add a step length:


Slide 44:Quasi-Newton Method Quasi-Newton methods use a Hessian-like matrix but without calculating second-order derivatives. Sometimes, these methods are referred to as the variable metric methods. Take the general formula: When A(k) = I (identity matrix), the formula becomes the formula of the steepest descent method. When A(k) = 2f(x(k))-1, the formula becomes the formula of the Newton method.


Slide 45:Starting from a positive definite matrix, the quasi-Newton methods gradually build up an approximate Hessian matrix by using gradient information from the previous iterations. The matrix A is kept positive definite; hence the direction s(k) = - A(k)f(x(k)) remains a descent direction. There are several ways to update the matrix A, one of which is where A(0) = I, (k) = x(k+1) – x(k) and (k) = f(x(k+1)) – f(x(k)).


Slide 46:Example Minimize using the quasi-Newton method with x(0) = [1, 1]T. Stop when ||f(x(k))|| ≤ 0.0001. Solution Iteration 1:


Slide 47:Iteration 2:


Slide 49:Quasi-Newton Methods (DFP) Quasi-Newton methods are based primarily upon properties of quadratic functions and they are designed to mimic Newton method using only first-order information. All Quasi-Newton methods generate directions using A(k) is an NN matrix called the metric. Methods employing this direction generation form are called variable metric methods because A changes at each iteration.


Slide 50:The recursion for the estimate to the inverse of the Hessian is The matrix Ac(k) is formed such that the sequence approaches H-1 = 2f(x*)-1.


Slide 51:The DFP formula used to update the matrix A is where A(0) = I, x(k) = x(k+1) – x(k) and g(k) = g(x(k+1)) - g(x(k)) =f(x(k+1)) – f(x(k)). The DFP formula preserves symmetry and positive definiteness so that the sequence A(1), A(2), … will also be symmetric and positive definite.


Slide 52:Example See the example and the solution in the book.