| Version 15 (modified by matthieu.brucher, 6 years ago) |
|---|
The openopt scikit
Constrained and unconstrained optimization
Framework for generic optimizers
This new framework proposes usual pieces of optimizers that can be used together efficiently and changed easily.
- The optimizer module provides the current state to each submodule and call them in the correct order
- The criterion module exposes some usual stopping criterion
- The step module proposes different steps (descent directions) with different arguments specific to each one
- The line search module is responsible for the search of a minimum on a line proposed by the step computed before
- The helpers module provides a collection of usual functions in optimization such as quadratic fit function
Idioms used in this framework :
- object-oriented so that each module is responsible for its actions (the function provides the cost, but the gradient as well if needed, ...)
- separation principle : each sub-module uses the optimizer state to communicate with one another and this allows to reuse them if needed
The separation principle is implemented this a state dictionary that is passed around in the different modules. Each time a criticial value is computed (gradient, step, step modification, ...), it should be added to the state so that other module can use it if they need it.
This state dictionary can be passed as keyword arguments to a record object that is registered to the optimizer with the formal argument record. This object is then called after each iteration and the internal of the optimizer can be saved.
What is a function?
A function is an instance that possess some methods :
- at least __call__(self, x) that returns the cost at point x
- possibly gradient(self, x) that will return the gradient at point x
- the hessian(self, x) returning the hessian at point x
- hessianvect(self, x, v) that will return hessian(self, x) * v (in an optimized way)
A fitting function is somewhat different :
- __call__(self, x, params) that will call f(x) for some parameters, returning an array of values of the cost function. Its shape must be (len(x), d) where d is the number of values returned by a call (if f is 2D and x has 10 points, the return array will have (10, 2) as its shape)
- gradient(self, x, params) will return the gradient of f(x), the size must be (x.shape[0], d, x.shape[1])
- hessian(self, x, params) will return the hessian of f(x), the size must be (x.shape[0], d, x.shape[1], x.shape[1])
- hessianvect(self, x, v, params) will return the hessian of f(x) times a vector v, the size must be (x.shape[0], d, x.shape[1])
Simple use of the framework
Suppose you have a simple two argumnets functions with its gradient :
>>> class Function(object):
def __call__(self, x):
return (x[0] - 2) ** 2 + (2 * x[1] + 4) ** 2
def gradient(self, x):
return numpy.array((2 * (x[0] - 2), 4 * (2 * x[1] + 4)))
You then must import the optimizers (suppose you are in the solvers folder) :
>>> import numpy >>> from optimizers import *
Then you can optimize your function :
>>> fun = Function()
>>> mystep = step.GradientStep()
>>> mylinesearch = line_search.GoldenSectionSearch(min_alpha_step = 0.0001)
>>> mycriterion = criterion.RelativeValueCriterion(ftol = 0.0001, iterations_max = 100)
>>> myoptimizer = optimizer.StandardOptimizer(function = fun, step = mystep, line_search = mylinesearch, criterion = mycriterion, x0 = numpy.zeros((2)))
array([ 2., -2.])
>>> myoptimizer.state
{'function': <__main__.Function object at 0xc67e0c>, 'new_parameters': array([ 2., -2.]), 'gradient': array([ 0., 0.]), 'old_value': 0.0, 'iteration': 35, 'step': array([-0., -0.]), 'step_size': 4.0856349008447342e-05, 'old_parameters': array([ 2., -2.]), 'new_value': 0.0}
You can use an inexact line search with the Wolfe Powell strong rules exactly in the same way :
>>> fun = Function()
>>> mystep = step.FRConjugateGradientStep()
>>> mylinesearch = line_search.StrongWolfePowellRule()
>>> mycriterion = criterion.RelativeValueCriterion(ftol = 0.0001, iterations_max = 100)
>>> myoptimizer = optimizer.StandardOptimizer(function = fun, step = mystep, line_search = mylinesearch, criterion = mycriterion, x0 = numpy.zeros((2)))
>>> myoptimizer.optimize()
array([ 2., -2.])
>>> myoptimizer.state
{'function': <__main__.Function object at 0x30d26c>, 'new_parameters': array([ 2., -2.]), 'gradient': array([ 0., 0.]), 'old_value': 0.0, 'iteration': 4, 'step': array([ 0., -0.]), 'step_size': 1.0, 'old_parameters': array([ 2., -2.]), 'step_modifier': 0.0, 'new_value': 0.0}
Use of helper functions
Fitting Data
Once the fit function returns the correct result, it can be used to create a Quadratic instance (for example) that can be optimized through the framework
>>> class F1(object):
def __call__(self, x, params):
return params[0] + params[1] * x
def gradient(self, x, params):
return numpy.array((numpy.ones(len(x)), x)).T
def hessian(self, x, params):
return numpy.zeros((len(x), 2, 2))
>>> x = numpy.random.random_sample((10))
>>> y = 3*x+4
>>> function = helpers.Quadratic(x, y, F1())
>>> x0 = numpy.zeros(2)
>>> opt = optimizer.StandardOptimizer(function = function,
x0 = x0,
step = step.GradientStep(),
line_search = line_search.FibonacciSectionSearch(min_alpha_step=0.000001),
criterion = criterion.RelativeValueCriterion(ftol = 0.00001, iterations_max = 1000))
>>> print opt.optimize()
[ 4. 3.]
>>> print opt.state
{'function': <optimizers.helpers.quadratic.Quadratic object at 0xe7e50c>, 'direction': array([ 5.32907052e-15, 1.22827925e-15]), 'new_parameters': array([ 4., 3.]), 'gradient': array([ -5.32907052e-15, -1.22827925e-15]), 'old_value': 2.3665827156630354e-30, 'iteration': 10, 'alpha_step': 4.5907169276693443e-07, 'old_parameters': array([ 4., 3.]), 'new_value': 2.3665827156630354e-30}
Finite Differences
Finite differences helps computing numerical gradient and/or hessian. If the gradient is provided, then the hessian will take advantage of it.
>>> class Function(helpers.ForwardFiniteElementDerivatives):
def __call__(self, x):
return (x[0] - 2) ** 2 + (2 * x[1] + 4) ** 2
>>> startPoint = numpy.zeros(2, numpy.float)
>>> optimi = optimizer.StandardOptimizer(function = Function(),
step = step.FRConjugateGradientStep(),
criterion = criterion.RelativeValueCriterion(0.0000001, 100),
x0 = startPoint,
line_search = line_search.StrongWolfePowellRule())
>>> print optimi.optimize()
[ 2. -2.00000004]
>>> print optimi.state
{'function': <__main__.Function object at 0xe7e8ac>, 'direction': array([ -1.13496692e-07, -1.37756364e-08]), 'new_parameters': array([ 2. , -2.00000004]), 'gradient': array([ 9.72597028e-08, 4.72195288e-08]), 'old_value': 7.7802561238186933e-15, 'iteration': 4, 'alpha_step': 0, 'old_parameters': array([ 2. , -2.00000004]), 'step_modifier': 0.11824091630690386, 'new_value': 7.7802561238186933e-15}
>>> startPoint = numpy.zeros(2, numpy.float)
>>> optimi = optimizer.StandardOptimizer(function = Function(),
step = step.NewtonStep(),
criterion = criterion.RelativeValueCriterion(0.0000001, 100),
x0 = startPoint,
line_search = line_search.SimpleLineSearch())
>>> print optimi.optimize()
[ 1.99999995 -2.00000005]
>>> print optimi.state
{'function': <__main__.Function object at 0xe7e98c>, 'direction': array([ -1.11022304e-16, 1.11022302e-16]), 'new_parameters': array([ 1.99999995, -2.00000005]), 'gradient': array([ 2.22044608e-16, -8.88178414e-16]), 'old_value': 1.2500000047903369e-14, 'iteration': 5, 'alpha_step': 1.0, 'old_parameters': array([ 1.99999995, -2.00000005]), 'new_value': 1.250000007010783e-14, 'hessian': array([[ 2.00000000e+00, -4.73316543e-16],
[ -4.73316543e-16, 7.99999998e+00]])}
TODO list
This is a list of things to add to the framework.
Criteria
Parameters based criteria- Gradient based criterion
- Composite criterion with several other criteria
- Modify current criteria to make them more modular and so that they do not include iteration limit check ?
- Criteria depending on a sliding window
Line searches
Cubic interpolation (last exact line search that is missing in the framework IIRC)Goldstein conditions- Barzilai & Borwein
- second order Armijo rule
- second order Wolfe-Powell rule
- decorator to start the search from a factor time the last step
from a fixed factor time the last step size- ??
- line searches from a fixed number of initial alpha steps (depending on polynomes ?)
Steps
Marquardt step- Goldstein-Price
- Dolfeld et al.
- Gill-Murray
- Fiacco-McCormich
- Fletcher-Freeman
- inexact Newton's method
- BFGS
Partial step (decorator that returns a random part of a direction)
Helpers
Quadratic sum for fitting data- Other sums for fitting data
Finite Differences class- Need to define hessianvect when it will be needed
Optimizers
- Trust-region methods investigation
- Constrained optimization
