|Version 3 (modified by warren.weckesser, 3 years ago)|
What is Algorithmic Differentiation (AD)
There is an ongoing disagreement in the AD community how the techniques explained on this pages should be called. The competitors are "Algorithmic Differentiation", "Automatic Differentiation" and "Computational Differentiation". They are just different names for the same thing.
The concepts of Algorithmic Differentiation (AD) are quite easy to understand. However, since only symbolic differentiation (SD) is taught in calculus classes and only numerical differentiation (ND) in classes on numerical mathematics it is wrongly believed that SD and ND are the only viable techniques that are available.
But AD is generally much more useful than SD and/or ND. Leading experts even consider AD as one of the most important numerical algorithms: See for example Nick Trefethen's talk http://www.comlab.ox.ac.uk/nick.trefethen/inventorstalk.pdf .
For more material see the AD community website http://www.autodiff.org and the homepage of an AD winter school http://www.sintef.no/Projectweb/eVITA/English/eSCience-Meeting-2010/Winter-School/ .
What is the Rationale of AD
The goal of AD techniques is to evaluate derivatives and polynomial approximations. For instance AD can be used to compute:
- the Jacobian J(x) = d/dx F(x) for functions F: RN --> RM
- the Hessian H(x) = d2/dx2 F(x) which is a 3-tensor for the above F.
- Jacobian vector products J(x) v , vT J(x)
- higher order tensors dk/dxk F(x)
even if F(x) is a complicated computer simulation containing if-then-else branches, recursions (e.g. for loops) etc. For instance, AD techniques work very well to differentiate PDE solvers and DAE/ODE solvers such that one can obtain derivatives of the state at any given point of time.
How does AD work?
The process of computing a quantity like the Jacobian is often called derivative accumulation. Though there are other methods for the accumulation, in practice typically only the so-called "forward mode of AD" and the "reverse mode of AD" are of interest since. The more elaborate methods can only gain relatively little in terms of computational efficiency but introduce considerable overhead. The forward mode can be done in Taylor arithmetic which allows the computation of higher order derivatives.
What AD tools are already available?
There are quite a few AD tools available in Python and even more in other programming languages. But none of the is a golden bullet. An incomplete list of AD tools for Python is:
- PYCPPAD / PYADOLC
Both provide almost the same functionality.
CppAD is typically a little faster and there also exist DLLs for windows. PYADOLC is a little more mature, IMHO.
Both can do abritrary degree of derivatives and work quite well with numpy, i.e. you can work with numpy arrays also quite efficient in the so-called reverse mode of AD requires boost::python
http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/ can provide first order derivatives. But as far as I understand only first order derivatives of functions f: R -> R and only in the usually not so efficient forward mode of AD
Available at http://openopt.org/FuncDesigner.
http://github.com/b45ch1/algopy/tree/master pure python, arbitrary derivatives in forward and reverse mode still quite experimental. Offers also the possibility to differentiate functions that make heavy use of matrix operations.
What's the Problem of Existing AD tools
All the tools do not provide efficient low level algorithms that one can use as building blocks for high performance hiqh quality software. E.g. though CppAD and ADOL-C have been written in C/C++ and even use OpenMP, they are not designed to provide Taylor polynomial algorithms. However, if one wishes to code a differentiated DAE solver one needs such algorithms. Also, there are many AD tools for many languages that also reinvent the wheel. Often the outcome is far from optimal. A recent refactoring of ALGOPY tried to go in the direction to separate the number crunching code from the API to make it possible to replace the algorithms written in Python by C code. However, some design issues suggest a ground up reimplementation where previous mistakes are not repeated.
It is time to write a C library that provides basic building blocks for AD tools and researchers that want to apply AD techniques in their codes. It is the goal to make TAYLORPOLY such library of algorithms useful for AD (http://github.com/b45ch1/taylorpoly). It is written in ANSI C. As of now, a good deal of the algorithms are already implemented and bindings to Python exist. However, there is still many algorithms to implement. In particular algorithms for:
- z=x**y where x and y are univariate Taylor polynomials
- hyperbolic trigonometric functions like tanh(x)
- reverse mode algorithms
- univariate matrix polynomial and reverse mode algorithms
- multivariate Taylor polynomials and cross Taylor polynomials.
Possibly the algorithms could be proposed to be included in either scipy or numpy.
What's already implemented in TAYLORPOLY?
To make an illustrative example of the forward mode using univariate Taylor propagation we use TAYLORPOLY which can be obtained from http://github.com/b45ch1/taylorpoly . It is the goal to compute the derivatives df/dx and d2/dx2 f::
import numpy import taylorpoly def f(x,y): return numpy.sin(x+y)/numpy.exp(x) # compute derivative by forward mode AD x = taylorpoly.UTPS([3.,1.,0.]) y = taylorpoly.UTPS([2.,0.,0.]) z = f(x,y) print 'AD solution: z = ',z.data, z.data, 2 * z.data # compute derivative by symbolic differentiation def dfdx(x,y): return (numpy.cos(x+y) - numpy.sin(x+y))/ numpy.exp(x) def d2fdx2(x,y): return ( - numpy.sin(x+y) - numpy.cos(x+y) -numpy.cos(x+y) + numpy.sin(x+y))/ numpy.exp(x) x = x.data y = y.data z0 = f(x,y) z1 = dfdx(x,y) z2 = d2fdx2(x,y) print 'symbolic solution: z = ', z0,z1,z2
which yields the output
b45ch1@shlp:~/Desktop$ python utps_example.py AD solution: z = -0.0477420284223 0.0618647370433 -0.0282454172421 symbolic solution: z = -0.0477420284223 0.0618647370433 -0.0282454172421
One can see that the AD solution is up to the last digit the same as the SD solution.
Example of what is not implemented yet
ALGOPY excels at matrix AD. We show how one can compute the derivative of a matrix valued function in the reverse mode of AD that cannot easily be differentiated by hand. In ALGOPY the reverse mode is called pullback::
import algopy import numpy # D-1 is the degree of the polynomial D = 4 # P > 1 vectorizes the computation P = 1 # N is the size of the matrix N = 50 # trace the function evaluation and perform forward mode AD cg = algopy.CGraph() A = algopy.Function(algopy.UTPM(numpy.random.rand(D,P,N,N))) Q,R = algopy.qr(A) E = algopy.inv(Q) F = algopy.dot(E,R) y = algopy.trace(F) cg.independentFunctionList = [A] cg.dependentFunctionList = [y] # plot the computational graph using graphviz cg.plot(filename='cg.png') # perform reverse mode AD tmp = numpy.zeros((D,P)) tmp[0,0] = 1. cg.pullback([algopy.UTPM(tmp)]) print 'dy/dA = ', cg.independentFunctionList.xbar.data